Skip to content

feat: add Iteration 3e for memory security and integrity (OWASP ASI06)#5

Merged
krokoko merged 1 commit into
mainfrom
feature/memory-security-roadmap
Apr 2, 2026
Merged

feat: add Iteration 3e for memory security and integrity (OWASP ASI06)#5
krokoko merged 1 commit into
mainfrom
feature/memory-security-roadmap

Conversation

@theagenticguy

Copy link
Copy Markdown
Contributor

Summary

Adds a new roadmap iteration (3e) and updates design documents to address memory poisoning — classified as OWASP ASI06 (Memory & Context Poisoning) in the 2026 Top 10 for Agentic Applications. Based on a deep research report covering 33 sources (academic papers, CVE disclosures, OWASP standards, and industry analyses).

Changes across 3 files (docs only, no code changes):

  • ROADMAP.md — New Iteration 3e with 4 implementation phases (input hardening → trust-aware retrieval → detection & response → advanced protections), addressing 9 identified memory-layer security gaps
  • MEMORY.md — New "Memory Security Analysis" section: full threat taxonomy (intentional + emergent corruption), 9-gap table with severity ratings, 6-layer defense architecture diagram, existing mitigations catalog, academic references
  • SECURITY.md — OWASP ASI06 classification, expanded attack vectors (MINJA, GitHub issue injection, experience grafting, RAG poisoning, emergent self-corruption), 6-layer defense requirements, updated Known Limitations

Key findings from research

  1. MINJA achieves 95%+ injection success against undefended agent memory (query-only, no direct access needed)
  2. Emergent self-corruption (hallucination crystallization, feedback loops) is equally dangerous — no attacker signature
  3. Defense requires 6 layers working together; no single control is sufficient
  4. The platform has strong network security but 9 memory-layer gaps — most critically, loadMemoryContext() injects retrieved memory without any validation

9 identified gaps

# Gap Severity
1 No memory content validation in loadMemoryContext() Critical
2 No source provenance tracking on memory entries Critical
3 GitHub issue content injected without trust differentiation Critical
4 No trust scoring at retrieval High
5 No memory integrity checking (hashing/signatures) High
6 No anomaly detection on memory patterns High
7 No memory rollback capability High
8 No write-ahead validation (guardian pattern) Medium
9 No circuit breaker for behavioral anomalies Medium

4-phase implementation roadmap

  • Phase 1 — Input hardening: content sanitization, GitHub issue filtering, provenance metadata, SHA-256 integrity hashing
  • Phase 2 — Trust-aware retrieval: trust scoring, temporal decay TTLs, memory validation Lambda
  • Phase 3 — Detection & response: CloudWatch anomaly detection, circuit breaker, quarantine API, snapshot rollback
  • Phase 4 — Advanced protections: guardian model, cross-task drift detection, Merkle tree provenance, MINJA/AgentPoison red teaming

Test plan

  • Verify ROADMAP.md renders correctly with new Iteration 3e section and updated summary
  • Verify MEMORY.md threat taxonomy tables, gap analysis table, and defense architecture diagram render correctly
  • Verify SECURITY.md expanded attack vectors and updated Known Limitations are consistent with MEMORY.md
  • Verify cross-references between ROADMAP.md ↔ MEMORY.md ↔ SECURITY.md resolve correctly
  • Confirm no breaking changes to existing content (docs-only PR)

Address OWASP ASI06 (Memory & Context Poisoning) with a 4-phase
implementation plan based on deep research (33 sources).

ROADMAP.md:
- Add Iteration 3e between 3d and 4 with 4 phases:
  Phase 1 (input hardening), Phase 2 (trust-aware retrieval),
  Phase 3 (detection and response), Phase 4 (advanced protections)
- Update summary section with 3e entry

MEMORY.md:
- Add Memory Security Analysis section with full threat taxonomy
  (intentional and emergent corruption vectors)
- Document 9 identified gaps with severity ratings
- Define 6-layer defense architecture
- Catalog existing partial mitigations
- Add academic and industry references

SECURITY.md:
- Add OWASP ASI06 classification and context
- Expand attack vectors beyond PR review comments (MINJA,
  GitHub issue injection, experience grafting, RAG poisoning,
  emergent self-corruption)
- Document 6-layer defense architecture requirements
- Update Known Limitations with memory security gaps
@theagenticguy theagenticguy requested a review from a team as a code owner April 2, 2026 19:50
@theagenticguy theagenticguy changed the title Add Iteration 3e: Memory security and integrity (OWASP ASI06) docs: Memory security and integrity Apr 2, 2026
@theagenticguy theagenticguy changed the title docs: Memory security and integrity docs: add Iteration 3e for memory security and integrity (OWASP ASI06) Apr 2, 2026
@theagenticguy theagenticguy changed the title docs: add Iteration 3e for memory security and integrity (OWASP ASI06) feat: add Iteration 3e for memory security and integrity (OWASP ASI06) Apr 2, 2026
@krokoko krokoko merged commit af6dafe into main Apr 2, 2026
7 of 12 checks passed
@theagenticguy

Copy link
Copy Markdown
Contributor Author

CI Note: The build job is failing due to a transient GitHub API rate limit (403) when mise tries to install trivy@0.69.2. This is unrelated to the docs-only changes in this PR — main branch builds pass fine. Re-running to clear the flake.

@krokoko krokoko deleted the feature/memory-security-roadmap branch April 6, 2026 23:07
scoropeza pushed a commit to scoropeza/sample-autonomous-cloud-coding-agents that referenced this pull request May 5, 2026
…rokoko review aws-samples#1, aws-samples#5, aws-samples#9, aws-samples#12)

Four related fixes on the fanout + github-comment surface, from the
code review on PR aws-samples#52. Grouped because they share the narrative
"defense-in-depth on the fanout dispatcher" — any one landing without
the others leaves a hole.

## Findings addressed

**aws-samples#1 — Fanout handler returns void despite reportBatchItemFailures: true**

The ``FanOutConsumer`` construct (``cdk/src/constructs/fanout-consumer.ts:146``)
has ``reportBatchItemFailures: true`` on its DDB Stream event-source
mapping. The handler returned ``void``, so Lambda retried the entire
batch on any unhandled throw instead of isolating the poisonous
record. Combined with aws-samples#5 this could cascade into retry storms and
violated the per-task ordering guarantee we rely on (§6.4, AD-9).

Fix: handler return type becomes ``Promise<DynamoDBBatchResponse>``.
Per-record processing is wrapped in try/catch; caught throws push
``{ itemIdentifier: record.eventID }`` to ``batchItemFailures`` and
emit ``fanout.record.failed`` warn. Final ``fanout.batch.complete``
log grows a ``failed`` count.

Note: ``DynamoDBStreamHandler`` constrains return to
``void | Promise<void>``, so the handler is typed as a plain 3-arg
async function. Lambda's runtime accepts either shape; existing
tests (passing ``event, context, cb``) work unchanged.

**aws-samples#5 — Unhandled exception in routeEvent crashes batch**

``routeEvent`` uses ``Promise.allSettled`` internally, but
``resolveTokenSecretArn`` can throw ``AccessDeniedException``
SYNCHRONOUSLY before the ``allSettled`` guard is reached. The new
per-record try/catch from aws-samples#1 catches these too.

**aws-samples#9 — renderCommentBody not self-defending against uncoerced DDB strings**

The ``.toFixed(4)`` call on ``costUsd`` is the same bug class as the
``toFixed is not a function`` crash we fixed at the fanout boundary
in commit 9fe704e. Today the sole call site coerces via the shared
helper; a future caller that forgets to would crash.

Fix: ``renderCommentBody`` coerces ``durationS`` and ``costUsd``
internally via the shared ``coerceNumericOrNull`` helper (second
line of defense; caller's coercion remains the first). Widened
``CommentBodyInput`` fields to ``number | string | null`` to
honestly model the DDB Document-client boundary.

**aws-samples#12 — Markdown injection possible via prUrl in GitHub comment body**

``prUrl`` was interpolated directly into a Markdown link target
(``[link](${input.prUrl})``). A crafted URL containing ``)`` / ``|``
/ ``\n`` could break the table layout or inject content, and a
``javascript:`` scheme could produce a click-to-execute link on some
Markdown renderers.

Fix: new exported ``sanitizeMarkdownLinkTarget`` helper in
``shared/github-comment.ts`` rejects URLs containing
``\r\n\t\s)|]"<>`` characters, validates via ``new URL()``, and
rejects non-http(s) schemes. Returns ``null`` on rejection so
``renderCommentBody`` omits the Pull-request row entirely rather
than emitting a broken or unsafe link.

## Tests

+22 regression tests net (fanout 7 for aws-samples#1+aws-samples#5 + 3 for aws-samples#9; github-comment
12 for aws-samples#12):

- Fanout partial-batch: poison-pill isolation, mixed-batch (good
  record NOT in failures), observability warn, empty-failures
  regression guard, baseline pin that today's ``Promise.allSettled``
  containment still works.
- renderCommentBody numeric self-defense: string-typed values render
  correctly; non-finite strings collapse to null with warn; null
  does NOT warn.
- sanitizeMarkdownLinkTarget unit tests: accept clean http/https,
  reject 9 injection patterns, reject 4 non-http schemes
  (``javascript:``, ``data:``, ``file:``, ``ftp:``), reject
  malformed, handle null/undefined. Plus end-to-end assertions on
  ``renderCommentBody`` proving the PR row is omitted on rejection.

CDK suite: 1029 passing (was 1001).

Refs: krokoko code review on PR aws-samples#52 (findings 1, 5, 9, 12)
isadeks pushed a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 13, 2026
…gs (aws-samples#79 review aws-samples#5)

The reaction / delete helpers (``addReaction``, ``removeReaction``,
``deleteMessage``) used to log every catch at warn with a single
generic event key, lumping API-level rejections (e.g. ``no_reaction``)
together with infrastructure failures (DNS lookup, TLS handshake,
fetch timeout, JSON parse error from a hostile gateway). Operators
who alarmed on the warn rate saw a flat signal that masked
genuine infra problems.

Split the boundary:

  - API-level (``!result.ok`` after a successful HTTP call) stays at
    warn with channel-specific event keys
    (``fanout.slack.reaction_add_api_error``,
    ``fanout.slack.reaction_remove_api_error``,
    ``fanout.slack.message_delete_api_error``). These are per-message
    UX problems; operators don't page.

  - Network errors (the outer ``catch (err)`` after ``fetch``)
    promote to ``logger.error`` with dedicated event keys
    (``fanout.slack.reaction_add_network_error``,
    ``fanout.slack.reaction_remove_network_error``,
    ``fanout.slack.message_delete_network_error``) and ``error_id``s
    (``FANOUT_SLACK_REACTION_NETWORK``,
    ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable
    signal. User-visible symptoms when these fire silently:
    stale emoji reactions (hourglass never swaps to ✅) and
    orphaned intermediate messages.

Behaviour unchanged: errors are still swallowed (per-message
reactions and intermediate cleanup are best-effort by design;
they must not fail the batch), but operators now get distinct
metrics for each failure class.
github-merge-queue Bot pushed a commit that referenced this pull request May 13, 2026
#79)

* feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64)

Move the Slack outbound delivery off its own DynamoDB Streams consumer
onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable
from 2 concurrent stream readers to 1, restoring headroom for future
channels (Email, Teams, etc.) without exceeding the documented
DynamoDB Streams 2-reader-per-shard practical limit.

The PR also addresses an adversarial code review on the original
migration; the body below walks through each piece in the order it
landed.

## (a) Migration

- `cdk/src/handlers/slack-notify.ts` — rewritten as exported
  `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError`
  class. The standalone `handler(event)` stream entrypoint is gone;
  the FanOutConsumer is now the only TaskEventsTable stream reader.
  Behaviour preserved bit-for-bit: channel_source==='slack' gate,
  terminal-event dedup via conditional UpdateItem on
  `channel_metadata.slack_notified_terminal`, threaded replies under
  the @mention or task_created message, emoji transitions
  (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite,
  intermediate session+created message cleanup on terminal events.

- `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only
  `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent
  and routes errors via the new typed contract (see (b) below). Slack
  defaults gain task_created, session_started, task_timed_out so the
  router fans out the lifecycle events the old SlackNotifyFn handled;
  the dispatcher's channel_source gate keeps non-Slack tasks unaffected.

- `cdk/src/constructs/fanout-consumer.ts` — adds a scoped
  `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the
  fanout Lambda can fetch per-workspace bot tokens. Same scope the old
  SlackNotifyFn role held.

- `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn,
  its DynamoEventSource, its IAM policy, and its NagSuppressions
  entry. Drops the now-unused StartingPosition / FilterCriteria /
  FilterRule / lambdaEventSources imports.

After this lands, `aws lambda list-event-source-mappings` shows
exactly one consumer of the TaskEventsTable stream (FanOutFn);
verified on the dev stack with end-to-end @mention + cancel + CLI
isolation scenarios.

## (b) Review fix #1 — partial-batch retry semantics (BLOCKER)

The first review pass found that the post-migration handler silently
dropped Slack-side infra errors (DDB throttle on the GetItem,
Secrets Manager 5xx, transient Slack timeout). Pre-migration the
SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried
the batch; post-migration `Promise.allSettled` swallowed the
rejection and routeEvent returned an empty list with no escalation
path to `batchItemFailures`.

routeEvent's return type changed from `NotificationChannel[]` to
`{ dispatched, infraRejections }`. The handler now pushes the
record into `batchItemFailures` whenever `infraRejections.length>0`,
so Lambda replays the record under the partial-batch contract. The
warn line on rejection is tagged `retryable: true` so operators can
alert distinctly from the channel-terminal swallow path.

GitHub got the symmetric treatment: 4xx (excluding the existing 401
and 404 handling) is now treated as a channel-terminal swallow via
`fanout.github.api_error` instead of escalating to retry.

## (c) Review fix #2 — split SlackApiError into terminal + retryable

Originally any `!result.ok` Slack response was wrapped in
SlackApiError and swallowed. That collapsed retryable codes
(`ratelimited`, `service_unavailable`, `internal_error`,
`fatal_error`, `request_timeout`) into the same swallow as
`channel_not_found` — a tier-1 Slack outage would silently drop
every message.

Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError`
helper. Terminal codes still throw SlackApiError (router swallows).
Retryable codes throw a plain Error so the router classifies them
as infra rejections and Lambda replays.

## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift

The original migration added task_created/session_started/task_timed_out
to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate
already excluded several events the router was subscribing Slack to
(agent_error, pr_created, task_stranded). Result: Slack was reported
as `dispatched` for events it silently dropped — telemetry lied,
agent_error never reached operators on Slack-origin tasks, and
task_stranded rendered the generic "Event: task_stranded for
owner/repo" fallback (UX regression).

Added render cases for task_stranded and agent_error in slack-blocks.ts
and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required
and status_response stay out of NOTIFIABLE_EVENTS until their emitters
ship; a new cross-file consistency test in fanout-task-events.test.ts
fails if anyone re-introduces the drift.

The Slack dispatcher wrapper now passes `effectiveEventType` so an
agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS
matching. Without the rewrite, the dispatcher would short-circuit
on the wrapper string `agent_milestone`.

## (e) Review fix #4 — conditional UpdateItem on lifecycle persists

Once the BLOCKER fix made batches retry, the original task_created
and session_started UpdateItem calls became hazardous: a Slack POST
that succeeded but whose follow-up UpdateItem failed transiently
would, on retry, post a second root and overwrite slack_thread_ts —
orphaning every threaded reply that had threaded under the first ts.

Both UpdateItems now carry an `attribute_not_exists` ConditionExpression
on the relevant `channel_metadata.slack_*_msg_ts`. On
ConditionalCheckFailedException the handler logs at info, deletes
the duplicate Slack message via `chat.delete`, and returns. Sibling
retry wins the race; the duplicate is cleaned up.

## (f) Dev-stack regression: drop pr_created from Slack defaults

Live verification surfaced a UX duplication: pr_created (subscribed
in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and
task_completed both rendered messages with View PR buttons, posted
seconds apart. The original SlackNotifyFn had silently dropped
pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it.

Removed pr_created from CHANNEL_DEFAULTS.slack and from
NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer.
GitHub keeps pr_created (its edit-in-place comment surface
genuinely benefits from the early checkpoint).

## Verification

- mise //cdk:compile  — clean
- mise //cdk:test     — 1183 / 1183 pass (8 net-new tests added for
                        the review fixes: NOTIFIABLE_EVENTS drift
                        guard, retryable Slack codes, GitHub 4xx
                        swallow, infra rejection escalation,
                        SlackApiError swallow, task_stranded render)
- mise //cdk:eslint   — clean
- mise //cdk:synth    — confirms exactly one Lambda::EventSourceMapping
                        on TaskEventsTable, pointing at FanOutFn
- Dev-stack scenarios — @mention happy path, Cancel button, CLI submit
                        (channel_source=api -> zero Slack dispatches,
                        GitHub edit-in-place still fires)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1)

PR #79 review found that the new 4xx terminal-swallow path treats
HTTP 403 and 429 as channel-terminal — but on GitHub these are
transient rate-limit responses (403 with "API rate limit exceeded",
429 "Too Many Requests"). Under a reconciliation wave that touches
many tasks, an entire window of GitHub comment updates would be
permanently dropped with only a warn log.

Carve out 403 and 429 from the swallow guard so they propagate as
infra rejections through ``Promise.allSettled``. The record lands
in ``batchItemFailures`` and Lambda replays until the rate-limit
window clears (or DLQs after ``retryAttempts``).

Test coverage: parametrized over 403 + 429 with a GitHubCommentError
mock at the helper boundary, asserting the record's eventID surfaces
in ``batchItemFailures`` rather than being absorbed.

* fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2)

Every other external-service grant in FanOutConsumer (taskTable,
repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a
deployment that hasn't onboarded the corresponding service stays
free of dangling IAM permissions. The original migration broke the
pattern with an unconditional ``bgagent/slack/*`` Secrets Manager
grant — dev stacks without Slack onboarding ended up holding read
permission on a resource pattern they never use, with a misleading
``cdk-nag AwsSolutions-IAM5`` suppression reason.

Adds an optional ``slackSecretArnPattern`` prop on
``FanOutConsumerProps``; the policy statement is only attached when
the prop is set. ``cdk/src/stacks/agent.ts`` now computes the
``bgagent/slack/*`` ARN inline and passes it through, mirroring the
other guarded props. ``ArnFormat`` and ``Stack`` imports moved out
of fanout-consumer.ts since the construct no longer needs them.

No changes to live behaviour — agent.ts always passes the prop, so
the IAM policy still attaches in production. The dispatcher will
log-and-fail-retry on a missing pattern (covered by review #3 fix).
Test gap covering the construct itself ships in a follow-up commit
(test gap #34).

* fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3)

Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed
event, ``dispatchSlackEvent`` returned silently after a warn line.
The router counted Slack as ``dispatched`` and a broken stack
quietly dropped every Slack notification — operators only saw it
in the warn-rate metric, with no rejected-channel signal.

Post-fix: throw a plain Error so the rejection propagates as an
infra rejection through ``Promise.allSettled``. The router pushes
the record into ``batchItemFailures``, Lambda retries the batch,
the ``fanout.dispatcher.rejected`` warn fires per record, and
operators get a distinct alarm.

Also bumps the existing log line from ``warn`` to ``error`` and
attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the
deployment-bug case can be distinguished from per-record failures.

Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes
the env var, asserts the throw, asserts no DDB call was attempted
(env-var guard fires first).

* fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7)

When a bundler ever duplicates the slack-notify module (rare with
NodejsFunction tree-shaking but possible if dual-bundled), two
distinct SlackApiError classes coexist and ``instanceof`` against
one fails for instances of the other. The dispatcher would see a
foreign-class SlackApiError, fall through to the rethrow branch,
and the router would treat it as an infra rejection — flipping a
channel-terminal swallow into infinite Lambda retries.

Add an ``err.name === 'SlackApiError'`` fallback so the swallow
branch fires either way. Mirrors the duck-typed
``GitHubCommentError`` check used elsewhere in the same handler.

Test: synthesise a plain Error with name === 'SlackApiError'
(NOT an instance of the mock's SlackApiError class) and assert
batchItemFailures stays empty — proving the swallow path catches
both shapes.

* fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8)

Original set omitted documented Slack permission/scope failures.
Codes outside the set fall to the retryable branch, so a
misconfiguration like ``ekm_access_denied`` or ``missing_scope``
would burn 3 Lambda retries before DLQ on every event — even
though the failure is fundamentally a configuration bug that no
retry can clear.

Adds:
  - Permission/scope: missing_scope, ekm_access_denied,
    team_access_not_granted, posting_to_general_channel_denied
  - Payload shape: invalid_arguments

Reorganized the set into commented blocks (channel-shape, auth,
permission/scope, payload-shape) so future additions go in the
right bucket and the rationale stays visible.

Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS
set (21 codes) — every one must throw SlackApiError so the router
swallows it. The existing retryable test.each remains intact and
covers the negative-class case (codes outside the set throw a
plain Error and escalate to retry).

* fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5)

The reaction / delete helpers (``addReaction``, ``removeReaction``,
``deleteMessage``) used to log every catch at warn with a single
generic event key, lumping API-level rejections (e.g. ``no_reaction``)
together with infrastructure failures (DNS lookup, TLS handshake,
fetch timeout, JSON parse error from a hostile gateway). Operators
who alarmed on the warn rate saw a flat signal that masked
genuine infra problems.

Split the boundary:

  - API-level (``!result.ok`` after a successful HTTP call) stays at
    warn with channel-specific event keys
    (``fanout.slack.reaction_add_api_error``,
    ``fanout.slack.reaction_remove_api_error``,
    ``fanout.slack.message_delete_api_error``). These are per-message
    UX problems; operators don't page.

  - Network errors (the outer ``catch (err)`` after ``fetch``)
    promote to ``logger.error`` with dedicated event keys
    (``fanout.slack.reaction_add_network_error``,
    ``fanout.slack.reaction_remove_network_error``,
    ``fanout.slack.message_delete_network_error``) and ``error_id``s
    (``FANOUT_SLACK_REACTION_NETWORK``,
    ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable
    signal. User-visible symptoms when these fire silently:
    stale emoji reactions (hourglass never swaps to ✅) and
    orphaned intermediate messages.

Behaviour unchanged: errors are still swallowed (per-message
reactions and intermediate cleanup are best-effort by design;
they must not fail the batch), but operators now get distinct
metrics for each failure class.

* fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6)

The conditional UpdateItem dup-delete path
(``task_created`` / ``session_started`` lifecycle persists)
calls ``deleteMessage`` to clean up the duplicate Slack message
that landed when a sibling retry won the race. The delete is
inherently best-effort — but if it fails, the duplicate becomes a
permanent ghost in the thread and operators had no way to alarm
on the rate.

Refactor ``deleteMessage`` to return a boolean (``true`` on success
or ``message_not_found``-as-already-gone, ``false`` otherwise) and
emit a dedicated ``fanout.slack.dup_delete_failed`` event with an
``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete
callsites when the cleanup couldn't complete.

The terminal-event cleanup paths (``slack_session_msg_ts``,
``slack_created_msg_ts``) intentionally don't fire this event —
those paths target genuinely-stale UX cleanup, not retry-driven
duplicates, so an alarm there would be noise.

No new tests beyond the existing dup-delete coverage; the
``deleteMessage`` return value isn't yet asserted at the unit
level, but the behavior is fully exercised by the existing
``dup-delete`` integration paths (test gap #31 will add an
explicit failure-path assertion when it lands).

* chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9)

``RouteOutcome.dispatched`` and ``infraRejections`` were typed as
plain ``NotificationChannel[]`` — which made ``readonly`` on the
property prevent reassignment but still allow callers to mutate the
underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent
with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the
same file.

Tightening to ``ReadonlyArray<NotificationChannel>`` makes the
contract honest: the router owns the arrays, callers read them.
Test suite updated to use ``[...outcome.dispatched].sort()`` where it
previously called ``.sort()`` directly — the explicit copy makes the
intent clear and would have surfaced any silent test-side mutation.

* refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10)

The two interfaces were structurally identical: same five fields,
same readonly modifiers, same metadata shape. The decoupling was
purely nominal and a silent-drift footgun — adding a field to
``FanOutEvent`` (e.g. when the router starts plumbing an
``approval_required`` ID through) would not flow into
``SlackDispatchEvent``, leaving the dispatcher unaware until a
downstream test happened to fail.

Replace with a one-line type alias:

  export type SlackDispatchEvent = FanOutEvent;

The slack-notify module now type-imports ``FanOutEvent`` from
fanout-task-events. ``import type`` is erased at compile time, so
the runtime bundle still has the one-way dep
(fanout-task-events → slack-notify) — no module-cycle hazard.

Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was
considered and rejected: the dispatcher uses every field of
``FanOutEvent``, so the Pick would just enumerate the same five
fields with extra noise. A direct alias keeps the intent obvious
and prevents drift identically.

* fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4)

PR #79 review #4 surfaced a sibling-channel-failure hazard: when
GitHub or Email rate-limits, the record lands in
``batchItemFailures``. On the Lambda retry, every Slack-subscribed
event for that record runs again. Terminal events were already
guarded by ``slack_notified_terminal``; ``agent_error`` was not —
operators would page twice on a single agent failure if a sibling
channel happened to fail.

Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by
a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the
``channel_metadata`` attribute that should guard the post:

  - 5 terminals share ``slack_notified_terminal`` (any first-arriving
    terminal claims the right; subsequent terminals dedup against it)
  - ``agent_error`` gets its own ``slack_dispatched_agent_error``
    so a duplicate agent_error doesn't reuse the terminal slot
  - ``task_created`` / ``session_started`` map to ``null`` because
    they already use the per-event ``slack_*_msg_ts`` conditional
    persists from review #1 — the conditional already provides
    full idempotency (a separate marker would be redundant)

Also surfaces Slack's ``Retry-After`` header on rate-limited
responses through a dedicated ``fanout.slack.retryable_api_error``
warn so operators reading CloudWatch can see the recovery window
instead of guessing from sustained warn rate.

Tests:
  - logs Retry-After header on rate-limited Slack responses (new):
    asserts ``retry_after_seconds`` propagates from Slack's response
    header into the warn metadata
  - existing terminal-codes parametrized test untouched (terminal
    branch doesn't read headers)
  - existing retryable test gains a ``headers: { get: () => null }``
    stub on the fetch mock so the headers.get call doesn't crash

Reviewer suggested a per-channel dispatch bitmap as the alternative.
Rejected as premature: the duplicate-GitHub-PATCH is harmless
(idempotent), Email is still a stub, and the dedup map covers
the specific agent_error pain identified above. A bitmap would add
a new table + IAM grants + per-dispatch DDB cost for a hypothetical
problem (Slack rate-limiting AND a sibling channel failure).

* test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap)

Adds 4 tests covering the lifecycle-persist conditional path that
review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the
only ConditionalCheckFailed coverage was the terminal-dedup path;
the new lifecycle-persist + dup-delete code lacked direct assertions
and was flagged 9/10 criticality by the reviewer.

  - task_created persist ConditionalCheckFailed → posts duplicate
    then deletes it: pins the cleanup behaviour that prevents ghost
    task_created posts in the channel
  - session_started persist ConditionalCheckFailed → posts duplicate
    then deletes it: parallel coverage for the other lifecycle
    attribute (slack_session_msg_ts)
  - dup-delete failure emits fanout.slack.dup_delete_failed with
    error_id: pins the operator-alarm signal added in review fix #6;
    asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED
    error_id propagate
  - chat.delete returning message_not_found is treated as success
    (no dup_delete_failed): negative-class assertion. Prevents
    false-positive alarms when the race resolves cleanly (the
    duplicate was already deleted by a prior retry).

The ghost / message_not_found tests use ``fetchMock.mockImplementation``
URL-routing rather than ``.mockResolvedValueOnce`` chains because
``updateReaction`` issues 2-3 reaction-API fetches between
chat.postMessage and chat.delete; routing by URL keeps the test
focused on the load-bearing chat.delete behaviour without coupling
to reaction call order.

* test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32)

Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage``
helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged
this as a 7/10 gap because the renderers carry the prior_status /
error_type / message_preview metadata threaded through from the
event source — silent drift in the metadata field names would
produce ugly fallback messages in production.

Adds 5 tests:

  - task_stranded WITH metadata renders the prior_status parenthetical
    (``Task stranded for org/repo (last status: RUNNING)``) so
    operators can tell at a glance whether the task hung in HYDRATING
    vs RUNNING — without the parenthetical the reviewer's "generic
    Event: ..." UX regression would resurface.
  - task_stranded WITHOUT metadata still renders cleanly (legacy
    events written before the reconciler started stamping metadata
    must not crash or leak ``undefined``).
  - agent_error with full metadata (error_type + message_preview)
    renders the rotating_light, type, and preview.
  - agent_error WITHOUT metadata stays sensible — no leaked
    ``undefined`` strings or empty ``_Type:_`` line.
  - agent_error truncates a 500-char message_preview to keep Slack
    channel UX readable.

* test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33)

Pre-PR-#79 review-fix #4 there was no direct test for the
``slack_dispatched_agent_error`` dedup attribute or its interaction
with the existing ``slack_notified_terminal`` slot. A future
refactor that collapsed the two slots — or renamed one of them —
would silently break the sibling-channel-failure-retry guarantee
that fix #4 added.

Adds 4 tests:

  - ``agent_error claims its own dedup attribute``: pins the
    UpdateExpression and ConditionExpression strings so a refactor
    that renames the attribute breaks loudly.
  - ``agent_error retry hits the dedup guard``: end-to-end scenario
    matching review #4 — task already has
    ``slack_dispatched_agent_error: true``, retry must short-circuit
    before chat.postMessage. Without the guard, a second
    rotating_light fires.
  - ``terminal dedup attribute is per-class``: a flaky
    task_completed-then-task_failed sequence dedups against the
    same ``slack_notified_terminal`` slot. Catches the regression
    where the orchestrator emits both terminal types and we'd
    otherwise post both ✅ and ❌.
  - ``agent_error and terminals use distinct dedup slots``: the
    important negative — having ``slack_dispatched_agent_error``
    set must NOT shadow a subsequent ``task_completed``. Pins
    the slot separation so a future merge into a single slot
    can't silently drop terminals after an agent_error.

* test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34)

The construct shipped on issue #64 with no unit-level coverage of
its IAM contract. The only synth-level signal lived inside
``slack-integration.test.ts`` ("0 EventSourceMapping") which proved
the migration didn't regress the OTHER construct. Reviewer flagged
this 6/10 — and the gap is what allowed review #2 (unconditional
Slack secret grant) to slip through in the first place.

Adds 6 tests:

  - ``attaches a single DynamoEventSource on the TaskEventsTable
    stream``: pins the architectural invariant — issue #64 was
    fundamentally about reaching exactly-one stream reader. Adding
    a second consumer must fail this test loudly.
  - ``creates a DLQ for the fanout Lambda``: pins retention period
    + presence; a DLQ-less deployment would silently drop
    poison-pill records past retryAttempts.
  - ``omits the bgagent/slack/* grant when slackSecretArnPattern
    is not provided``: the review #2 invariant. Iterates every
    IAM::Policy and asserts NONE of them grant secretsmanager:*
    on a bgagent/slack/* ARN. A regression that re-introduces the
    unconditional grant breaks this test.
  - ``attaches the bgagent/slack/* grant only when
    slackSecretArnPattern is provided``: the positive case. Pins
    the grant shape (action, effect, resource pattern).
  - ``passes TASK_TABLE_NAME env var when taskTable is provided``:
    review #3 dependency — the dispatcher throws on missing env.
  - ``omits TASK_TABLE_NAME env var when taskTable is not provided``:
    graceful degrade for dev stacks that haven't onboarded the
    TaskTable yet (matches the construct's documented contract).

* test(fanout): cover task_stranded through terminal dedup (#79 test gap #35)

The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits
BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired
task — one for the operator signal, one to drive the FAILED status
transition. Pre-PR-#79 this pair had no test coverage; reviewer
flagged this 8/10 because the visible failure mode (a paired
"Task stranded" + "Task failed" double-page in Slack) would
surface in production but be silent in CI.

Adds 2 tests:

  - ``task_stranded posts and writes the terminal dedup marker on
    first arrival``: pins that task_stranded participates in the
    shared terminal slot and renders the warning message with
    metadata. Catches a regression that omits task_stranded from
    the dedup map entirely.
  - ``task_stranded after a sibling task_failed dedups``: the
    operational scenario — task_failed already claimed
    ``slack_notified_terminal``; the subsequent task_stranded must
    short-circuit before chat.postMessage. Without this guard,
    operators get the double-page the reviewer warned about.

* fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race

Live observation during PR #79 review verification: the same Slack
@mention happy path sometimes leaves the 🚀 task_created message
in the thread (orphaned beside the ✅ task_completed) and sometimes
deletes it cleanly. The race window:

  1. ``task_created`` stream batch posts the rocket message and
     persists ``slack_created_msg_ts`` via the conditional UpdateItem
     introduced in PR #79 review fix #1.
  2. ``task_completed`` stream batch fires ~30s later. Its initial
     GetItem races the prior UpdateItem and sees a stale
     ``channel_metadata`` WITHOUT ``slack_created_msg_ts``.
  3. The terminal cleanup branch checks
     ``channelMeta.slack_created_msg_ts`` — undefined — silently skips
     the chat.delete. The rocket message stays in the thread.

Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after
the dedup UpdateItem has linearized our view of the table. Any prior
``slack_*_msg_ts`` writes are visible by then, so the cleanup fires
correctly. On a re-read failure (DDB throttle / transient blip) we
fall back to the dispatch-entry snapshot and emit
``fanout.slack.cleanup_reread_failed`` so operators can alarm on
the rate.

Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was
the same shape — wrote, GetItem on the next batch could miss it).
PR #79 doesn't introduce it but doesn't fix it either; this commit
does, since the live screenshot evidence appeared during review
verification.

Tests:
  - ``terminal cleanup re-reads TaskRecord``: scripts a stale
    dispatch-entry GetItem followed by a fresh re-read GetItem with
    ``slack_created_msg_ts`` present; asserts chat.delete fires
    against the freshly-read ts.
  - ``terminal cleanup falls back to dispatch-entry snapshot when
    re-read fails``: defense-in-depth — DDB throttle on the re-read
    must not break terminal delivery; cleanup uses the entry snapshot
    and emits the fallback warn.

---------

Co-authored-by: bgagent <bgagent@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alain Krok <alkrok@amazon.com>
isadeks pushed a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 26, 2026
Addresses the blocker + critical items from PR review:

- Refresh-token race (review blocker). Linear rotates refresh_tokens
  on every use; concurrent Lambdas/agents racing the same secret will
  all read the same expiring token and one's refresh will succeed
  while the others get `invalid_grant`. On `invalid_grant`, re-read
  the secret from Secrets Manager (bypassing cache). If the
  refresh_token has changed, another caller already rotated; use the
  freshly-read token (or retry refresh once if it's also expiring).
  If unchanged, the refresh_token is permanently rejected and the
  workspace needs re-onboarding. Implemented in both the TS resolver
  (linear-oauth-resolver.ts) and Python resolver (config.py).

- Unguarded bedrock_agentcore import in agent/src/server.py
  (review critical). The bare `from bedrock_agentcore.runtime.context
  import BedrockAgentCoreContext` inside `_run_task_background` killed
  the entire pipeline thread with no diagnostic if the SDK was
  missing or its module structure changed. Wrap in
  try/except (ImportError, AttributeError) and log via _warn_cw —
  the Linear token resolver has its own SM fallback, so the agent
  can proceed without the workload-token bridge.

- Cache invalidation on fetch-level refresh failure (review high).
  The TS resolver's `invalidateLinearOauthCache()` only ran in the
  `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the
  catch returned null without invalidating, leaving the stale
  expiring token cached for 60s and hammering Linear's token
  endpoint. Move invalidate into the fetch-level catch too.

- Malformed expires_at log (review medium). The Python `_is_expiring`
  caught `ValueError` and silently returned True, masking
  consistently-bad writes. Add a WARN log so operators see the bad
  data instead of just an unexplained refresh on every task.

- Positive-path refresh log (review non-blocking aws-samples#5). Added
  INFO-level breadcrumb on successful refresh in both resolvers
  so operators diagnosing intermittent 401s have a trace of which
  workspace refreshed and to what expiry.

11/11 existing resolver unit tests still pass; will add tests for
the new race-recovery branch in a followup commit.
isadeks added a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 26, 2026
Addresses the blocker + critical items from PR review:

- Refresh-token race (review blocker). Linear rotates refresh_tokens
  on every use; concurrent Lambdas/agents racing the same secret will
  all read the same expiring token and one's refresh will succeed
  while the others get `invalid_grant`. On `invalid_grant`, re-read
  the secret from Secrets Manager (bypassing cache). If the
  refresh_token has changed, another caller already rotated; use the
  freshly-read token (or retry refresh once if it's also expiring).
  If unchanged, the refresh_token is permanently rejected and the
  workspace needs re-onboarding. Implemented in both the TS resolver
  (linear-oauth-resolver.ts) and Python resolver (config.py).

- Unguarded bedrock_agentcore import in agent/src/server.py
  (review critical). The bare `from bedrock_agentcore.runtime.context
  import BedrockAgentCoreContext` inside `_run_task_background` killed
  the entire pipeline thread with no diagnostic if the SDK was
  missing or its module structure changed. Wrap in
  try/except (ImportError, AttributeError) and log via _warn_cw —
  the Linear token resolver has its own SM fallback, so the agent
  can proceed without the workload-token bridge.

- Cache invalidation on fetch-level refresh failure (review high).
  The TS resolver's `invalidateLinearOauthCache()` only ran in the
  `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the
  catch returned null without invalidating, leaving the stale
  expiring token cached for 60s and hammering Linear's token
  endpoint. Move invalidate into the fetch-level catch too.

- Malformed expires_at log (review medium). The Python `_is_expiring`
  caught `ValueError` and silently returned True, masking
  consistently-bad writes. Add a WARN log so operators see the bad
  data instead of just an unexplained refresh on every task.

- Positive-path refresh log (review non-blocking aws-samples#5). Added
  INFO-level breadcrumb on successful refresh in both resolvers
  so operators diagnosing intermittent 401s have a trace of which
  workspace refreshed and to what expiry.

11/11 existing resolver unit tests still pass; will add tests for
the new race-recovery branch in a followup commit.
isadeks added a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 26, 2026
… 2.0b) (aws-samples#160)

* feat(linear): resolve API token via AgentCore Identity (Phase 2.0a)

Migrates the agent runtime's Linear personal API token resolution from
AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the
"validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the
API key for OAuth and converge Linear MCP onto AgentCore Gateway in
one cutover.

Per Alain's guidance: "start by using api key, if it works, switch to
oauth. you will setup an outbound auth for your server using agentcore
identity. that identity can be (AC identity is like a wrapper around
secrets manager) api key or oauth."

Lambdas (orchestrator + processor) intentionally keep using Secrets
Manager via the existing `LinearApiTokenSecret` for now. The Python
`bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration
requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds
into 2.0b's bigger refactor. End-state of 2.0a: agent reads from
Identity, Lambdas read from Secrets Manager, both pointing at the same
underlying token value (admin populates both).

`agent/src/config.py::resolve_linear_api_token`:

  - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env.
  - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in
    Identity vault).
  - Calls `IdentityClient.get_api_key()` with the workload access token
    auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime
    (verified by reading the SDK's `auth.py` decorator implementation —
    no manual workload-identity mint needed inside the runtime).
  - Caches the resolved token in `LINEAR_API_TOKEN` so downstream
    consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}`
    placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL
    Authorization header.

Preserves PR aws-samples#87's nice-to-have improvements:

  - `ImportError` graceful fallback (now for `bedrock_agentcore` instead
    of `boto3`) — degrade with WARN, don't crash the agent.
  - `AccessDeniedException` and `ResourceNotFoundException` logged at
    ERROR severity (persistent IAM/config bugs that should page).
    Other ClientErrors stay at WARN (transient throttle/network).

`agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep.

`cdk/src/stacks/agent.ts`:

  - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret.
    grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var
    override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded
    `'linear-api-key'` for now; can parametrize later via context if
    multi-environment naming is needed) and IAM permissions for
    `bedrock-agentcore:GetResourceApiKey` and
    `bedrock-agentcore:GetWorkloadAccessToken`.
  - Lambdas (orchestrator + processor) untouched — they still grant on
    the Linear secret and read from Secrets Manager.
  - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN
    format isn't fully standardized in public docs as of 2026-05-15.
    Tighten in 2.0b when OAuth migration documents the canonical
    resource shape.

`docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the
one-time `agentcore add credential --type api-key --name linear-api-key`
admin command users must run alongside the existing `bgagent linear
setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and
2.0b will retire the dual-store setup. Starlight mirror synced.

`agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests
covering: cached env var fast-path; missing provider name; missing
region; workload token absent (outside runtime); happy path with
env-var side-effect; botocore error swallowed with WARN; SDK returns
None defensively; ImportError fallback; AccessDeniedException → ERROR
severity; ResourceNotFoundException → ERROR severity.

542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean.
CDK synth clean.

`bedrock_agentcore` SDK confirmed working in our runtime image (verified
in `node_modules` post-install). The `BedrockAgentCoreContext` workload
token auto-injection is documented behaviour for code running inside
AgentCore Runtime — verified by reading the SDK's `@requires_api_key`
decorator implementation, which uses the same context lookup we use
here.

Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on
`config.py` and `test_config.py` if aws-samples#87 needs further rework before
merge — happy to rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(linear): use aws CLI for credential provider, not the agentcore command

The setup guide referenced `agentcore add credential` which doesn't actually
work end-to-end:

  - The Python `bedrock-agentcore-starter-toolkit` CLI (`agentcore`) only
    exposes agent-lifecycle commands; there is no `credential-provider`
    subcommand. Confirmed by reading the toolkit's CLI reference and by
    user trying `agentcore configure credential-provider --type api-key
    --name ...` and receiving `No such command 'credential-provider'`.
  - The new npm `@aws/agentcore` CLI does have `agentcore add credential`
    but uses a declarative project model — the credential lands in
    `agentcore.json` + `.env.local`, not the actual AgentCore Identity
    vault, until `agentcore deploy` runs against a project structured for
    that CLI. ABCA isn't structured that way.

Switch the docs to the plain AWS CLI which works directly against the
AgentCore Identity API:

    aws bedrock-agentcore-control create-api-key-credential-provider \
      --name linear-api-key \
      --api-key "<paste lin_api_… token here>" \
      --region us-east-1

Plus the matching `list-api-key-credential-providers` for verification.
Add a "Tooling note" at the bottom of the section explaining why the
plain AWS CLI is the right path here vs. the two `agentcore` CLIs.

Starlight mirror synced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(linear): pass runtimeUserId so AgentCore injects WorkloadAccessToken

Smoke on backgroundagent-dev caught a real bug in the Phase 2.0a
migration: the agent's `resolve_linear_api_token()` was correctly
calling `IdentityClient.get_api_key()` but failing earlier at
`BedrockAgentCoreContext.get_workload_access_token()` returning None.
The Linear MCP then loaded with an unresolved `${LINEAR_API_TOKEN}`
placeholder and 👀 didn't post.

Root cause (from reading bedrock-agentcore-sdk-python source):

The `WorkloadAccessToken` request header (which the runtime container
reads to populate `BedrockAgentCoreContext`) is only injected by
AgentCore Identity when `InvokeAgentRuntimeCommand` is called with
`runtimeUserId`. Per AWS docs at
https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html:

  "Agent Runtime exchanges this token for a Workload Access Token via
   bedrock-agentcore:GetWorkloadAccessTokenForJWT API and delivers it
   to your agent code via the payload header `WorkloadAccessToken`."

Without `runtimeUserId`, AgentCore never derives a workload token and
the header is absent. `app.py::_build_request_context` reads the
header off the inbound request; the agent sees None.

Fix:

1. Thread `userId` through the `ComputeStrategy.startSession` interface
   (compute-strategy.ts).
2. Pass `task.user_id` (the task's Cognito sub) at the call site in
   orchestrate-task.ts.
3. Set `runtimeUserId: input.userId` on `InvokeAgentRuntimeCommand` in
   agentcore-strategy.ts. Log it alongside session_id for traceability.
4. ECS strategy accepts the new parameter to satisfy the interface;
   doesn't use it (ECS doesn't go through AgentCore Identity).
5. Grant the orchestrator role `bedrock-agentcore:InvokeAgentRuntimeForUser`
   alongside `InvokeAgentRuntime` (task-orchestrator.ts). Without this,
   the new `runtimeUserId` parameter would 403.

Tests updated:
- `agentcore-strategy.test.ts`: pin that `runtimeUserId` flows from
  input into the SDK command; pass `userId: 'cognito-user-1'` in 4 call
  sites.
- `ecs-strategy.test.ts`: pass `userId` (unused by ECS) on 3 call sites.
- `start-session-composition.test.ts`: pass `userId: 'cognito-test'` on
  3 call sites.
- `task-orchestrator.test.ts`: assert the IAM action list includes
  `InvokeAgentRuntimeForUser` (2 assertions).

542 agent / 1273 cdk / 196 cli — all green. Lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(linear): two undocumented gotchas to make AgentCore Identity actually work

End-to-end smoke on backgroundagent-dev surfaced two more silent failure
modes after the runtimeUserId fix landed:

`BedrockAgentCoreContext.get_workload_access_token()` returned None inside
the pipeline thread even though the platform delivered the token on the
inbound request. Cause: Python ContextVar storage is per-thread, not
shared across `threading.Thread` boundaries. Our `_run_task_background`
spawns a new thread for the pipeline, so any context-var the SDK's
middleware sets in the request handler thread doesn't reach it.

Compounding factor: the SDK's `_build_request_context` middleware only
runs when using `BedrockAgentCoreApp` from `bedrock_agentcore.runtime`.
Plain FastAPI apps like ours never get that bridge at all.

Fix: read the workload token off the request in `_extract_invocation_params`
(handling both observed header spellings — `WorkloadAccessToken` and
`x-amzn-bedrock-agentcore-runtime-workload-accesstoken`), thread it through
the kwargs of `_run_task_background`, and have the pipeline thread call
`BedrockAgentCoreContext.set_workload_access_token` on entry.

   (cdk/src/stacks/agent.ts)

After (1) was applied, `IdentityClient.get_api_key()` actually fired and
got `AccessDeniedException: ... not authorized to perform:
secretsmanager:GetSecretValue`.

Cause: AgentCore Identity stores api-key credentials in Secrets Manager
under reserved prefix `bedrock-agentcore-identity!*` (the actual ARN
shape: `arn:aws:secretsmanager:REGION:ACCOUNT:secret:bedrock-agentcore-
identity!default/apikey/<provider-name>-<hash>`). The `GetResourceApiKey`
control-plane API surfaces the underlying secret to the caller, and AWS
verifies the *caller* role (our runtime role) has `GetSecretValue` on
the actual secret resource — not the SLR.

Fix: grant the runtime role `secretsmanager:GetSecretValue` scoped to
the `bedrock-agentcore-identity!*` prefix in the current
account/region. Tightly scoped to Identity-managed secrets; doesn't
leak read access to other Secrets Manager resources.

- Runtime container reads workload token from request, propagates across
  thread boundary, calls IdentityClient successfully
- 👀 reaction posts at +525ms after task pickup, no warnings
- Linear MCP loads cleanly with the resolved token
- No more `workload access token not in context` WARN
- No more `AccessDeniedException` from `GetResourceApiKey`

Three undocumented requirements total for Phase 2.0a (combining with
the runtimeUserId fix from the prior commit):

  1. Caller (orchestrator) sends `runtimeUserId` and has
     `InvokeAgentRuntimeForUser` IAM
  2. Runtime container bridges the workload-token header into the
     ContextVar, with per-thread propagation if the pipeline runs in a
     spawned thread
  3. Runtime role has `secretsmanager:GetSecretValue` on
     `bedrock-agentcore-identity!*`

All three are silent failures on their own; missing any one returns None
or AccessDenied without obvious "you forgot X" diagnostics. Will file an
upstream issue against `aws/bedrock-agentcore-sdk-python` summarising
all three so others don't burn the same cycles.

Tests: 542 agent / 1273 cdk / 196 cli — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b): foundation — workspace registry, admin invite-user, Linear app template

Wave 1 of Phase 2.0b: prereq pieces for the Linear OAuth migration.

- LinearWorkspaceRegistryTable: maps Linear org-id → AgentCore credential
  provider name, so webhook + orchestrator Lambdas can resolve the
  workspace's OAuth token without knowing about provider naming.
- bgagent admin invite-user: wraps Cognito admin-create-user with the
  right defaults and prints a base64 bundle that --from-bundle decodes
  into ~/.bgagent/config.json. Replaces a four-flag dance with a single
  paste for joining teammates.
- bgagent linear app-template: prints the Linear OAuth app form values
  captured from the 2.0b spike — GitHub username with [bot] suffix and
  Webhooks ON gate the actor=app flow; misleading "Invalid redirect_uri"
  error is the symptom when either is missing.
- USER_GUIDE roles section + joining-an-existing-deployment flow: makes
  the four-role lifecycle explicit (stack admin / workspace admin /
  repo onboarder / teammate) so a teammate landing on the docs has a
  clear non-admin path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(linear): rewrite setup guide for OAuth (2.0b)

Replace the personal-API-key flow with the Linear OAuth `actor=app`
install path verified by the 2.0b spike. Major changes:

- Step 1: AgentCore credential provider via `bgagent linear
  oauth-register-workspace`, capturing the AWS-hosted callback URL
  that Linear will actually see.
- Step 2: Linear OAuth app creation via `bgagent linear app-template`,
  documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON
  gates that produce Linear's misleading "Invalid redirect_uri" error
  when missing.
- Step 4: OAuth dance via the rewritten `bgagent linear setup` —
  ephemeral localhost HTTPS callback; no own ALB/Lambda needed since
  AWS proxies the OAuth flow.
- Step 7: clarify that the PAK-owner auto-link becomes the
  setup-runner auto-link; the manual DDB mapping path stays for now
  until self-service `@bgagent link` ships.
- New "Adding additional Linear workspaces" section for
  multi-workspace deployments.
- New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook.
- Troubleshooting expanded to cover the Invalid-redirect_uri and
  401-from-Linear scenarios surfaced in the spike.

Notes the docs reference commands shipping in Wave 2 (aws-samples#63
oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the
2.0b branch is a coherent unit and aws-samples#62 must land before those flows
are wired so the docs aren't a moving target during implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): workload-token retrieval helper for AgentCore Identity (2.0b C1)

The CLI runs OUTSIDE AgentCore Runtime, so the in-container ContextVar
trick from 2.0a does not apply. This module gives every 2.0b OAuth-flow
command a single way to obtain a workload access token:

- getWorkloadAccessToken({region, workloadName, userId}) calls the
  data-plane GetWorkloadAccessTokenForUserId, scoping the resulting
  token to (workload, cognito_sub) so OAuth-token retrieval is per
  platform user.
- decodeCognitoSub(idToken) extracts the sub claim from the cached
  id_token, parsing only — token validation is API Gateway's job.
- DEFAULT_CLI_WORKLOAD_NAME is the deployment-time convention; the
  workload identity itself will be created by a follow-up CDK custom
  resource (aws-samples#61). Stack output 'CliWorkloadIdentityName' wires the
  CLI to whatever the deployed name actually is.

Two SDK errors get translated into actionable remediation hints:
- ValidationException: WorkloadIdentity is linked to a service —
  documented footgun from the spike, surfaces when the CLI is
  pointed at a runtime workload.
- AccessDeniedException / ResourceNotFoundException — same surface
  treatment, with the bgagent-side checklist embedded in the message.

Adds @aws-sdk/client-bedrock-agentcore + bedrock-agentcore-control as
CLI deps. Pins all CLI AWS SDK clients to 3.1024.0 (matching) to keep
the @smithy/core dependency graph deduplicated; mixed-version pins
caused interface-collision typecheck errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): bgagent linear oauth-register-workspace (2.0b B2)

Registers a Linear workspace as an AgentCore OAuth2 credential provider.
The command:

- Validates the workspace slug shape ([a-zA-Z0-9_-]{4,50}) so the
  resulting provider name fits AgentCore's 64-char limit.
- Prompts for clientId + clientSecret (interactive, not echoed).
- Calls CreateOauth2CredentialProvider with credentialProviderVendor=
  'CustomOauth2' and explicit authorizationServerMetadata for Linear's
  fixed endpoints (Linear has no .well-known/openid-configuration, so
  vendor-discovery cannot auto-resolve).
- Prints the AWS-hosted callback URL the operator pastes into Linear's
  app form — the AWS-side proxy that Linear actually redirects to.
- Idempotent: re-running with an existing provider name fetches the
  callbackUrl and reports "already exists — re-using it".

Smoke test against dev account (2026-05-19) revealed AWS surfaces the
duplicate-name case as ValidationException (NOT ConflictException as
CFN/REST conventions would suggest). Detection is by message-substring
match; tests cover both the duplicate path and the "ValidationException
for a non-duplicate reason" path so we don't accidentally swallow input
validation errors.

AccessDeniedException gets a remediation hint pointing at the
'bedrock-agentcore:CreateOauth2CredentialProvider' permission, since
the most common misconfiguration is running the command as a
Cognito-authenticated CLI user (no permissions) rather than as an
admin/stack-deploy IAM principal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b): CLI workload identity + localhost OAuth callback server (A3)

Two pieces that together let the CLI run the OAuth dance without any
externally-facing infrastructure:

CDK side (CliWorkloadIdentity construct, wired into the agent stack):
- Creates a dedicated AgentCore Identity workload identity named
  `bgagent-cli`, distinct from the runtime workload (which is service-
  linked and cannot mint user-scoped tokens).
- Allowlists `https://localhost:8443/oauth/callback` as a permitted
  resourceOauth2ReturnUrl. AgentCore validates browser-redirect URLs
  against this list, so the CLI cannot finish the OAuth dance without it.
- Implementation: AwsCustomResource (no L2/L1 for AgentCore Identity in
  CDK as of May 2026). Idempotent — Create/Update/Delete lifecycle wired
  so re-deploys reconcile the allowlist and stack-deletes don't leak
  workload identities (50/account-region quota).
- Stack outputs `CliWorkloadIdentityName` and `LinearWorkspaceRegistryTableName`
  so the CLI can discover them at runtime.

CLI side (oauth-callback-server module):
- Generates a fresh self-signed cert in /tmp via openssl on each
  invocation; cert is cleaned up when the server shuts down.
- Starts an HTTPS listener on localhost:8443/oauth/callback, captures the
  first request's `session_id` query param, renders a success page,
  shuts down. Uses res.once('finish') to ensure the response body
  flushes before the listener closes — otherwise the browser hangs
  waiting for bytes that never arrive (caught by integration test).
- Translates EADDRINUSE and timeout into actionable CliErrors.

The CLI URL constant and the CDK default allowlist must agree on the
exact URL string — drift would silently break the OAuth dance with
"redirect_uri not allowlisted". A regression-locking test on the URL
constant + matching CDK default flags the issue at unit-test time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): bgagent linear setup OAuth dance orchestration (2.0b C2/C3)

Replaces the personal-API-key wizard with a 7-step OAuth flow that
authorizes a Linear workspace via AgentCore Identity:

  1. Resolve stack outputs (CliWorkloadIdentityName, registry table,
     user mapping table, webhook secret ARN). Errors loudly if any are
     missing — typically means the stack predates 2.0b.
  2. Read Cognito sub from cached id_token.
  3. Mint workload access token via getWorkloadAccessTokenForUserId.
  4. Initiate OAuth dance: getResourceOauth2Token returns an authorize
     URL + sessionUri. customParameters: {actor: 'app'} propagates so
     Linear surfaces the Agent install variant of the consent screen
     (verified via 2.0b spike).
  5. Start localhost HTTPS callback server, open browser to the auth
     URL, await session_id from the callback.
  6. Poll getResourceOauth2Token (5s/600s) until accessToken arrives;
     translate sessionStatus=FAILED into a Linear-app-config remediation
     hint.
  7. Query Linear viewer + organization with the OAuth token, persist
     the workspace registry row + admin user-mapping row, then prompt
     for the webhook signing secret if not already configured.

Hard cutover from PAK: the new wizard is OAuth-only — there is no
--use-pak flag. The webhook signing secret prompt remains because
HMAC verification of inbound Linear webhooks is independent of how the
agent calls Linear outbound. Webhook prompt is skipped on subsequent
add-workspace runs by detecting the lin_wh_ prefix on the stored
secret; --rotate-webhook-secret forces a re-prompt.

Splits queryLinearIdentity out so both the legacy PAK auto-link helper
(authorization=`lin_api_…`) and the OAuth path (authorization=
`Bearer <token>`) reuse the same GraphQL query. The PAK helper stays
exported to support the legacy linkage path until LinearApiTokenSecret
is retired in aws-samples#70.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cdk): use full SDK v3 package name for AgentCore Identity custom resource

CDK's AwsCustomResource auto-derives the SDK package name from `service`
by lowercasing and dropping hyphens — `'BedrockAgentCoreControl'` becomes
`@aws-sdk/client-bedrockagentcorecontrol`, which doesn't exist. The
actual package is `@aws-sdk/client-bedrock-agentcore-control` (hyphens).

Verified by deploy: with the lowercased mapping the Lambda backing the
CR fails with "Package @aws-sdk/client-bedrockagentcorecontrol does not
exist" and the stack rolls back. Switching to the full v3 package name
(supported per the AwsSdkCall.service jsdoc) routes the import correctly.

Verified end-to-end: `bgagent-cli` workload identity created with
`https://localhost:8443/oauth/callback` on the resourceOauth2ReturnUrls
allowlist, stack outputs populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(2.0b): park diagnostic flags + tokenEndpointAuthMethods + force-reauth

Smoke test against backgroundagent-dev (2026-05-19) hit a service-side
bug in AgentCore Identity: USER_FEDERATION token-exchange against
Linear's /oauth/token never completes. sessionStatus stays IN_PROGRESS
indefinitely, no FAILED transition, no diagnostics on the wire.

Verified via manual curl that Linear's token endpoint works perfectly
with the same clientId/secret/scopes/code/actor=app — bug is on AWS's
side. AgentCore Identity has zero token-injection APIs, so Option 3
(do OAuth ourselves + inject) is architecturally impossible. AWS
support case + PAR-compatibility upstream issue
aws/bedrock-agentcore-sdk-python#111 are the official fix paths.

Parking the wizard work but committing the diagnostic flags we added
during triage so they're available when this is unparked:

- `tokenEndpointAuthMethods: ['client_secret_post']` on the provider
  metadata. Linear expects POST-body credentials; AgentCore defaults to
  Basic. Field name verified against the SDK type (`tokenEndpointAuthMethods`,
  not the `Supported` suffix the boto3 reference suggested).
- `--verbose-poll` flag on `bgagent linear setup` — prints per-poll
  sessionStatus + response keys so the stuck state is visible.
- `--force-reauth` flag — sets `forceAuthentication: true` on
  GetResourceOauth2Token to bypass cached tokens after a Linear-side
  revoke.
- `CompleteResourceTokenAuth` call between callback capture and poll
  loop. Per AWS sample 09-Outbound_Auth_Self_Hosted, this is required
  to bind the captured session to a userId. Confirmed it's NOT what
  unblocks our specific bug, but is correct per spec for any
  USER_FEDERATION flow.

Status of resume paths in memory/project_oauth_2_0b.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b-O2): direct Linear OAuth + per-workspace Secrets Manager

Replaces the parked AgentCore Identity OAuth flow with a CLI-side
direct OAuth dance against Linear's /oauth/token endpoint. The flow
verified by the manual curl smoke test on 2026-05-19 returned a valid
access_token in <100ms, so we know Linear's side works. AWS's
USER_FEDERATION wrapper is broken specifically for Linear (or
actor=app); see memory/project_oauth_2_0b.md for the parked-bug
details and resume prompt.

Architecture:
- New module cli/src/linear-oauth.ts owns the OAuth helpers:
  generatePkce (S256), buildAuthorizationUrl (with actor=app),
  exchangeAuthorizationCode, refreshAccessToken,
  StoredLinearOauthToken JSON shape, computeExpiresAt,
  isAccessTokenExpiring (60s threshold), linearOauthSecretName.
  19 hermetic tests (no network).
- Per-workspace Secrets Manager secret bgagent-linear-oauth-<slug>
  holds the token JSON. CLI creates+updates at runtime via upsertOauthSecret
  (CreateSecret + ResourceExistsException → PutSecretValue fallback).
- LinearWorkspaceRegistryTable row gains oauth_secret_arn. Lambdas
  resolve workspace → secret_arn → token JSON, with refresh-if-expiring.
  (Lambda migration is Wave C.)
- bgagent linear setup is rewritten end-to-end:
  prompt-for-credentials → PKCE → open browser → callback captures
  ?code+?state → state verify (CSRF) → exchangeAuthorizationCode →
  query Linear viewer+org → write secret + registry row + user mapping
  → webhook secret prompt (unchanged from prior wizard).
  No AgentCore calls. No polling. No CompleteResourceTokenAuth.
- Localhost callback server now exposes both AgentCore-style
  (session_id) and direct-Linear-style (code+state+error) shapes
  via a CallbackResult with nullable fields. Backward-compat with
  the parked AgentCore path's tests.

Removals:
- cli/src/agentcore-identity.ts + test (workload-token helper)
- cdk/src/constructs/cli-workload-identity.ts + test (workload identity)
- providerNameForWorkspace, buildLinearProviderInput,
  registerLinearWorkspace, initiateOauthDance, completeResourceTokenAuth,
  pollForOauthAccessToken, AgentCore SDK imports — all gone from linear.ts
- bgagent linear oauth-register-workspace command (no AWS-side provider
  to register; folded into setup)
- CliWorkloadIdentityName CfnOutput from agent.ts
- 6 describe blocks of AgentCore-flavored tests in linear.test.ts

Net change: -1100 lines, +700 lines of new direct-OAuth wiring.
286/286 CLI tests pass. 9/9 linear-integration CDK tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(2.0b-O2): space-separated OAuth scopes + --no-actor-app diagnostic

End-to-end smoke test against backgroundagent-dev (2026-05-20):

- The OAuth dance was failing with Linear's "Invalid redirect_uri" error
  even though the redirect_uri was correct. Root cause: scopes were
  comma-separated (`read,write,...`) instead of space-separated. RFC
  6749 §3.3 mandates space; Linear surfaces the violation as the
  misleading "Invalid redirect_uri" error, the same misdirection we
  hit during the 2.0b spike. Fix: `.join(' ')` in buildAuthorizationUrl.
- Adds `--no-actor-app` diagnostic flag on `bgagent linear setup`. Drops
  the `actor=app` query param so a stuck flow can be isolated to
  agent-install vs vanilla-OAuth without changing the Linear app config.
  Off by default; surfaces a warning when invoked.

After the fix, full smoke test passed:
- Browser opens to Linear consent
- User authorizes, redirects to https://localhost:8443/oauth/callback
- CLI captures code+state, exchanges for access_token + refresh_token
- Token JSON persisted to bgagent-linear-oauth-maguireb in Secrets Manager
- LinearWorkspaceRegistryTable row written with oauth_secret_arn
- LinearUserMappingTable row written for the admin
- Token verified against Linear's GraphQL viewer query (works)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b-O2): Wave C — migrate Lambdas + agent runtime to per-workspace OAuth

Replaces every consumer of the legacy LinearApiTokenSecret PAK with the
per-workspace Secrets Manager OAuth-token pattern from Waves A/B. Deploy
of this commit will fully cut over the integration; the LinearApiTokenSecret
construct is gone.

CDK side:
- New `cdk/src/handlers/shared/linear-oauth-resolver.ts` resolves
  workspace_id → registry row → oauth_secret_arn → token JSON →
  refresh-if-expiring → access_token. In-memory caches (1m TTL) on
  both registry rows and token JSON. Lazy refresh with PutSecretValue
  write-back so concurrent Lambdas see the rotated token. 11 unit tests.
- linear-feedback.ts: postIssueComment / addIssueReaction /
  reportIssueFailure now take a {linearWorkspaceId, registryTableName}
  context instead of an apiTokenSecretArn. Auth header switches from
  bare PAK value to `Bearer ${accessToken}`.
- linear-webhook-processor.ts: env vars LINEAR_WORKSPACE_REGISTRY_TABLE_NAME
  replace LINEAR_API_TOKEN_SECRET_ARN. safeReportIssueFailure threads
  the webhook payload's organizationId through to the resolver. Webhook
  processor now stamps `linear_oauth_secret_arn` + `linear_workspace_slug`
  into channel_metadata at task-creation time so the agent runtime can
  fetch the secret directly without a registry round-trip.
- orchestrate-task.ts: notifyLinearOnConcurrencyCap reads
  LINEAR_WORKSPACE_REGISTRY_TABLE_NAME and the task's
  channel_metadata.linear_workspace_id.
- LinearIntegration construct: drops apiTokenSecret + ApiTokenSecret
  Secrets Manager resource entirely. Webhook processor IAM now grants
  Get+Put on `bgagent-linear-oauth-*` Secrets Manager prefix.
- Agent stack: orchestrator IAM mirrors the new prefix grant.
  Runtime IAM drops AgentCore Identity grants and gains Get+Put on
  `bgagent-linear-oauth-*`. LINEAR_API_KEY_PROVIDER_NAME env var,
  LINEAR_API_TOKEN_SECRET_ARN env var, and LinearApiTokenSecretArn
  CfnOutput all removed.

Agent side (Python):
- config.py::resolve_linear_api_token: rewritten to read the per-task
  channel_metadata.linear_oauth_secret_arn (or LINEAR_OAUTH_SECRET_ARN
  env fallback) via boto3.secretsmanager. Lazy refresh: if expires_at
  is within 60s, POST refresh_token grant to Linear /oauth/token using
  client_id/client_secret co-located in the secret JSON, write the
  rotated token back via put_secret_value, return the new access_token.
- pipeline.py: passes config.channel_metadata into resolve_linear_api_token.
- linear-oauth.ts (CLI): StoredLinearOauthToken schema gains client_id +
  client_secret fields so Lambda + agent refresh can run without
  per-Lambda OAuth env vars. Setup wizard writes them.

Tests pruned of AgentCore Identity mocks; new tests cover the
Secrets-Manager-direct path (CDK 11 + agent 6 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): bundle import.meta.url shim for durable-execution SDK

`@aws/durable-execution-sdk-js@1.1.3`'s ESM build calls
`fileURLToPath(import.meta.url)` at module load. esbuild's ESM→CJS
bundling leaves `import.meta.url` undefined, crashing every invocation
with `TypeError: path must be a string`.

Define an identifier substitution + banner that materializes a valid
file:// URL from `__filename` at runtime. Discovered while smoke-testing
Wave C end-to-end on backgroundagent-dev.

Refs: aws/aws-durable-execution-sdk-js#543

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): switch OAuth callback to plain HTTP localhost

Per RFC 8252 §7.3, OAuth providers (including Linear) treat
http://localhost as a special case that doesn't need TLS — the
connection never leaves the host. The previous self-signed-cert HTTPS
approach forced testers through a "connection not private" warning that
scared them off mid-setup.

Drops the openssl shell-out + temp-cert plumbing (~60 lines) along with
the user-facing warning copy in `bgagent linear setup`. Updates the
callback constants to http://localhost:8080/oauth/callback and the test
suite to plain http.GET.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent): ruff lint + format for OAuth refresh path

Five lint errors surfaced when CI ran `ruff check --fix` against the
Wave C agent changes:

- F401 unused `timezone` import in `config.py` (replaced with
  `timedelta`, which is what's actually needed)
- RUF034 useless if-else in the `expires_at` ternary — both branches
  returned identical strings before the recompute below; flatten into
  a single straightforward `if expires_in: ... else: ...` block
- E501 three line-length violations in `config.py` and
  `test_config.py` — break the long expressions onto helper-named
  intermediates

Confirmed locally: `ruff check .` clean, `ruff format --check` clean,
`pytest tests/test_config.py` 15/15 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(linear-feedback): rewrite tests against OAuth context signature

Wave C migrated postIssueComment / addIssueReaction / reportIssueFailure
from a (secretArn: string, ...) signature to a (ctx: LinearFeedbackContext,
...) signature, but the test file still passed bare strings — TypeScript
caught it at compile time only when CI ran a full build. Three test
suites failed to compile (the typecheck error blocked the whole suite,
not just this file).

- Mock `resolveLinearOauthToken` (the new resolver) instead of
  `getLinearSecret` (the old PAK fetcher).
- Build a `LinearFeedbackContext` fixture with linearWorkspaceId +
  registryTableName, pass it everywhere SECRET_ARN was used.
- Update the Authorization-header assertion to match the new
  `Bearer <token>` form (PAK was bare-token; OAuth is Bearer-prefixed).

All 41 tests across linear-feedback, linear-webhook-processor, and
orchestrate-task-feedback pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cdk): align yarn.lock with upstream main + bump table count for OAuth registry

Two CI failures came together because they share a root cause: this
branch's yarn.lock had drifted from upstream main during interim
re-resolves, leaving an inconsistent dep tree that broke ts-jest's
module resolution for @aws-cdk/mixins-preview/aws-bedrockagentcore.
Restoring upstream main's yarn.lock fixes the resolution; the
agent.test.ts table-count assertion then needs to bump from 12 to
13 to account for the LinearWorkspaceRegistryTable added in
Phase 2.0b Wave A4.

Verified locally: agent.test.ts (44/44) and github-tags.test.ts
(5/5) both pass after the changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: apply eslint --fix from CI's self-mutation guard

CI runs `mise run build`, which invokes `eslint --fix` and then fails if
the working tree changed (self-mutation guard). Three cosmetic lints
needed applying:

- Import-order: DynamoDBClient and CliError moved earlier in their
  files to satisfy alphabetic-by-package ordering
- formatJson import added in alphabetic position in linear.ts
- Three template literals with no interpolation converted to
  single-quoted strings in oauth-callback-server.ts and linear.ts
  (eslint quotes rule prefers single-quotes when no template
  variables are used)

Pure mechanical fixes; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(oauth): refresh-token race recovery + log gaps from review

Addresses the blocker + critical items from PR review:

- Refresh-token race (review blocker). Linear rotates refresh_tokens
  on every use; concurrent Lambdas/agents racing the same secret will
  all read the same expiring token and one's refresh will succeed
  while the others get `invalid_grant`. On `invalid_grant`, re-read
  the secret from Secrets Manager (bypassing cache). If the
  refresh_token has changed, another caller already rotated; use the
  freshly-read token (or retry refresh once if it's also expiring).
  If unchanged, the refresh_token is permanently rejected and the
  workspace needs re-onboarding. Implemented in both the TS resolver
  (linear-oauth-resolver.ts) and Python resolver (config.py).

- Unguarded bedrock_agentcore import in agent/src/server.py
  (review critical). The bare `from bedrock_agentcore.runtime.context
  import BedrockAgentCoreContext` inside `_run_task_background` killed
  the entire pipeline thread with no diagnostic if the SDK was
  missing or its module structure changed. Wrap in
  try/except (ImportError, AttributeError) and log via _warn_cw —
  the Linear token resolver has its own SM fallback, so the agent
  can proceed without the workload-token bridge.

- Cache invalidation on fetch-level refresh failure (review high).
  The TS resolver's `invalidateLinearOauthCache()` only ran in the
  `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the
  catch returned null without invalidating, leaving the stale
  expiring token cached for 60s and hammering Linear's token
  endpoint. Move invalidate into the fetch-level catch too.

- Malformed expires_at log (review medium). The Python `_is_expiring`
  caught `ValueError` and silently returned True, masking
  consistently-bad writes. Add a WARN log so operators see the bad
  data instead of just an unexplained refresh on every task.

- Positive-path refresh log (review non-blocking aws-samples#5). Added
  INFO-level breadcrumb on successful refresh in both resolvers
  so operators diagnosing intermittent 401s have a trace of which
  workspace refreshed and to what expiry.

11/11 existing resolver unit tests still pass; will add tests for
the new race-recovery branch in a followup commit.

* fix(2.0b-O2): review-2 batch — error specificity, half-creates, runbook

Addresses four PR review items focused on operator UX when things go
sideways:

- isWebhookSecretConfigured (review high). The bare
  `catch { return false }` swallowed AccessDeniedException and
  DecryptionFailureException, making setup re-prompt for a webhook
  secret when the real problem was IAM. Now: only
  ResourceNotFoundException returns false; everything else throws a
  CliError pointing the operator at the IAM gap. Test updated to
  assert both paths.

- admin invite-user half-create (review medium). If
  AdminCreateUser succeeds but AdminSetUserPassword fails (stricter
  password policy than generator, partial IAM grant on the Set verb),
  the user was left in FORCE_CHANGE_PASSWORD with no diagnostic. Wrap
  the second call in try/catch and throw a CliError that names the
  user, explains the broken state, and gives both a delete-user CLI
  and a manual-fix path.

- PAK migration runbook (review non-blocking #1). Expanded the
  "Migration from 2.0a (PAK) to 2.0b (OAuth)" section in
  LINEAR_SETUP_GUIDE.md with: a pre-deploy checklist, what survives
  the migration vs what doesn't, an explicit rollback note (fix
  forward; the original PAK secret is gone with the CFN resource),
  and the per-step difference between 2.0a-with-Identity (skipped) vs
  2.0a-with-PAK (migrate) deploys.

- Vestigial AgentCore Identity dep (review non-blocking #2).
  bedrock-agentcore==1.9.1 is kept in agent/pyproject.toml because
  the workload-token bridge in server.py still calls it (now wrapped
  in try/except per review batch 1). Add an inline comment
  explaining why it's pinned even though Phase 2.0b-O2 reads
  Secrets Manager directly — it's the seam for resuming the
  AgentCore Identity path in 2.0c.

CLI tests: 13/13 pass.

* fix(2.0b-O2): batch 3 — schema validation, CallbackResult union, parity contract

Three remaining substantive review items from PR aws-samples#160:

- Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7).
  Was checking only access_token / refresh_token / expires_at; missing
  client_id or client_secret only surfaced 24h later when the refresh
  call needed them and found undefined. Extracted the required-field
  list into a const next to the StoredOauthToken interface and check
  the full set at deserialization. Bad secrets fail fast at fetch
  time with a structured log line naming the missing fields.

- CallbackResult discriminated union (review non-blocking aws-samples#6). Was
  `{ sessionId: string|null, code: string|null, state: string|null }`
  which let callers construct unreachable shapes. Split into
  `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`.
  Updated the resolver site (`oauth-callback-server.ts`), the
  consumer (`bgagent linear setup`), and the test file to use
  exhaustive type-narrowing. The setup wizard now errors clearly if
  it gets the agentcore shape (parked path) instead of silently
  passing nulls down.

- Cross-language schema-parity contract test (review non-blocking #3).
  CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define
  the same JSON-in-Secrets-Manager schema independently; drift
  between the two would be a silent bug (CLI writes one field name,
  Lambda reads another, refresh works, every Lambda invocation logs
  a missing-field error). New test in
  `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses
  both interface definitions out of source and asserts the field set
  is equal. Also asserts the new
  `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface,
  so future field additions can't drift between the validator and
  the type.

CLI tests 286/286 pass. CDK resolver + contract 13/13 pass.

* style: apply eslint --fix indentation on CallbackResult union

* fix(2.0b-O2): review-3 batch — defensive error handling, security hardening, refresh test coverage

Bugs (B1-B3):
- linear-oauth-resolver: try/catch around ddb.send() in getRegistryRow so
  transient DDB errors fail the resolver cleanly instead of crashing the
  Lambda thread
- orchestrate-task: try/catch around reportIssueFailure in
  notifyLinearOnConcurrencyCap; Linear feedback failures must never block
  the rejection path
- Python _fetch_token: guard json.loads + KeyError so a corrupted SM
  payload returns None (logged ERROR) rather than raising

Tests (T1-T3):
- Python: TestResolveLinearApiTokenRefreshPaths covering happy refresh,
  invalid_grant + concurrent rotation, invalid_grant + no rotation,
  malformed expires_at, network failure during refresh, and corrupted
  secret JSON
- TS: concurrent-refresh recovery via re-read; permanent-rejection on
  same refresh_token; cache invalidation after network failure

Security (S1-S3):
- agent runtime IAM: drop secretsmanager:PutSecretValue on the Linear
  OAuth secret prefix. Untrusted repo code in the agent must not be
  able to overwrite tokens; Lambdas (trusted) handle persistence. The
  refreshed in-memory token still works for the current task; rotated
  refresh_token is lost on agent exit but Linear's grace window
  absorbs the rare race where the agent refreshes strictly before any
  Lambda
- Python _try_refresh_once: narrow except Exception to
  (urllib.error.URLError, OSError) — programmer errors propagate with
  clean stack traces instead of being swallowed
- linear-oauth-resolver: RegistryRowStatus is now a discriminated
  'active' | 'revoked' literal; missing or unknown values fail closed
  to revoked rather than defaulting active

* fix(2.0b-O2): use email.message.Message for HTTPError hdrs in tests

ty's stricter typeshed flagged dict[Unknown, Unknown] passed where
Message[str, str] is expected. Drop-in replacement that satisfies
both ty and runtime.

* fix(2.0b-O2): update feedback test for B2 — helper now swallows internally

Round-3 review B2 moved the try/catch into notifyLinearOnConcurrencyCap.
The pre-existing test asserted the old contract (rejection propagates,
caller must catch). Flip the contract assertion to reflect the new
behavior: the helper is now best-effort end-to-end and returns undefined
on internal failure.

---------

Co-authored-by: bgagent <bgagent@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alain Krok <alkrok@amazon.com>
krokoko pushed a commit that referenced this pull request Jun 9, 2026
…ompt, cap, parity (#248)

#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload
failure AFTER run_agent mutated/committed the tree stranded the work as FAILED
with no PR. read_only now comes from config (already fallback-computed and the
verdict that drove Cedar); the reload — needed only for the ensure_pr strategy —
is wrapped in the same WorkflowValidationError fallback build_config uses,
defaulting to "create" so the PR still opens.

#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to
the generic default-agent prompt. Add a research-specialized, repo-less prompt
(prompts/web_research.py) and register it.

#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full
copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap
actually bounds memory on the MicroVM before materializing the bytes.

#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID
/ DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform
default and be requires_repo:false in the YAML; the coding default must be
repo-bound. Closes the last hand-maintained-copy axis #10 named.

Perf — memoize load_workflow (@cache): files are image-baked and immutable per
process and Workflow is frozen, so the 3-4 loads per task now parse once.

Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent
while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a
larger half-migrated-runner refactor both reviewers flagged as non-blocking.
krokoko added a commit to ayushtr-aws/sample-autonomous-cloud-coding-agents that referenced this pull request Jun 10, 2026
…rkflows (aws-samples#248) (aws-samples#296)

* Initial design docs

* docs(workflows): address principal-review findings on ADR-014

- Validator parity: declare JSON Schema as the single canonical shape
  contract (consumed, not re-implemented), keep exactly one cross-field
  validator impl in Phases 1-3, and add a contracts/workflow-validation/
  golden corpus to lock any Phase-4 second impl to parity (mirrors the
  cedar-parity mechanism).
- Cedar principal migration: split into its own isolated PR (Phase 2a)
  reviewed alone with regenerated parity fixtures, landing ahead of the
  pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it.
- Repo-optionality: reframe web_research as a not-yet-runnable schema
  fixture (not an acceptance proof) and require the two blocking open
  questions (memory actorId, artifact delivery) to be resolved as a
  recorded ADR addendum before the Phase-0 schema is frozen.

Regenerated Starlight mirrors.

* feat(agent): workflow models + loader (aws-samples#248 Phase 1)

Add the agent/src/workflow/ package: Pydantic models mirroring
workflow.schema.json and a loader that resolves a first-party workflow
id, parses YAML, and shape-validates against the canonical JSON Schema.
Additive only — does not yet touch task_type (the breaking cutover lands
later in Phase 1).

- models.py: Workflow + sub-models; resolved_requires_repo applies the
  domain-derived default (coding=true, knowledge/hybrid=false).
- loader.py: JSON-Schema shape validation (one canonical contract,
  consumed not re-implemented), YAML load, id/path agreement, and
  path-traversal defense. Cross-field rules deliberately live in the
  separate validator (Task #2), so there is one cross-field impl.
- Promote pyyaml + jsonschema to direct deps (were only transitive); the
  loader must not depend on another package's transitive pin.
- Fix a latent schema bug found by the new tests: the requires_repo/
  read_only allOf conditionals fired vacuously when the property was
  absent, wrongly applying repo-less constraints to a coding workflow
  relying on the domain default. Guard each `if` with `required`.

17 new tests; agent ruff + ty + full suite (836) green.

* feat(agent): cross-field workflow validator + golden corpus (aws-samples#248 Phase 1)

The single CI-time implementation of the WORKFLOWS.md cross-field rules
the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader
stays shape-only and trusts this verdict, so there is exactly one cross-field
implementation in Phases 1-3 (avoids the cedar-style two-language drift).

contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so
the expected-verdict contract is fixed before aws-samples#246 adds a second validator,
mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every
first-party workflow file through the validator at CI time.

* feat(agent): workflow step runner + handler registry (aws-samples#248 Phase 1)

The agent-side step runner that interprets workflow.steps inside the container
(ADR-014): a StepKind→handler registry, in-order execution honouring each step's
on_failure policy (fail / continue / skip_remaining), per-step
step:<name>:start/<status> progress milestones, and resume-aware checkpointing
(deterministic side-effect-free steps recorded to workflow_state.json on the
persistent mount are skipped on resume; agentic/side-effecting steps re-run
idempotently per WORKFLOWS.md §Step execution semantics).

Handlers are thin wrappers over the existing helpers (setup_repo, run_agent,
verify_build/lint, ensure_pr) so this is maximal structural change, minimal
logic change. post_review/deliver_artifact are registered but raise
NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping
validator rule-8 handler parity honest. Orchestration core is unit-tested with
fakes; wiring into pipeline.run_task is task 5.

* feat(agent): first-party workflow files — coding/new-task-v1 + default/agent-v1 (aws-samples#248 Phase 1)

Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight
clone→build→open-PR coding path) and ships the platform default workflow
default/agent-v1.yaml — the conservative last rung of the resolution ladder
used when no workflow_ref and no Blueprint default apply (requires_repo:false,
read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations
are Phase 2b (depend on the Cedar principal migration).

Both pass the cross-field validator and the test_workflow_tree_valid CI gate
(file path agrees with declared id; zero violations).

* fix(agent): address code-review findings in workflow validator + runner (aws-samples#248 Phase 1)

Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1
commits:

runner.py
- Resume-skip product loss: clone_repo/hydrate_context populate in-memory
  products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON
  checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr)
  and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to
  verify_build/verify_lint — steps whose ENTIRE product is the checkpointed
  boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on
  resume (handler-level idempotency), matching WORKFLOWS.md's intent.
- Skipped steps now emit step:<name>:start / :skipped milestones so a watcher
  sees them accounted for instead of a gap.

validator.py
- _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single
  source of truth — instead of a hand-copied list that could silently drift.
- rule-6 tier ceiling now applies the elevated-only-fields check to read-only
  (the strictest tier) as well as standard; previously read-only could declare
  mcp_servers/plugins/skills unchecked.
- rule-11 now verifies a deliver_artifact step's target actually produces the
  declared comment/artifact outcome, not merely that the step kind is present.

Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target
to stay isolated to their intended rule, plus a new rule11 target-mismatch
fixture. Full agent suite green (909 passed), ruff + ty clean.

* feat(agent): wire workflow runner into pipeline behind a gate (aws-samples#248 Phase 1, task 5)

Builds the runner→pipeline seam and gates it off (default production behavior
unchanged). pipeline.run_task's agent invocation now goes through
_execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a
shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single
run_agent step is dispatched through workflow.run_workflow — exercising the real
handler registry, step milestones, and result threading — while clone, context
assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly
the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no
workflow file until Phase 2b, so they fall back to inline.

run_workflow gains an only_kinds filter so the seam drives just the run_agent
step (no double clone / double PR against the inline post-hooks). The flag is
flipped to default-on in the tasks 6-8 cutover.

Folds in the group-B code-review fixes (effective when the seam is enabled):
- system prompt: hydrate_context builds ctx.system_prompt via the existing
  build_system_prompt (was empty), reusing pipeline's logic rather than
  reimplementing it.
- verify gate: new shared _gate_status gives verify_build/verify_lint matching
  semantics — informational/read_only never gate, regression_only consults
  build_before/lint_before (mirrors pipeline's build_ok = passed or not
  build_before), strict always gates. Fixes the regression_only build bug and
  the verify_lint read_only asymmetry.
- clone_repo is idempotent (reuses a pre-populated ctx.setup).
- StepContext threads the --trace trajectory into run_agent (was dropped).

ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing
the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite
green (930 passed), ruff + ty clean.

* fix(agent): address second-round review findings on the workflow seam (aws-samples#248 Phase 1)

- Error-handling contract (highest severity): when the run_agent step's handler
  raises, run_workflow captures it into a failed StepOutcome rather than
  propagating. _execute_agent_step now RE-RAISES in that case so run_task's
  except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror
  (read by TaskDashboard / bgagent status), the span error, and the real
  exception text — instead of silently downgrading to a generic AgentResult.
  This also makes the previously-dead agent_result-is-None branch live.

- _gate_status: an unset gate now defaults to regression_only semantics (gate
  only a regression; a pre-existing failure does not gate), matching pipeline.py
  which is unconditionally 'build_ok = passed or not build_before'. Previously an
  unset gate fell through to strict, contradicting the docstring's 'mirrors
  pipeline.py' claim and would wrongly fail a broken-before repo.

- run_agent handler fails loud on an empty system prompt rather than running an
  unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap
  into an attributable failed step instead of a silent context-free run).

- _workflow_id_for_task_type: documented as a transitional bridge tied to the
  canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be
  kept in sync until tasks 6-8 delete it.

Tests added for each behavior change. Full agent suite green (934 passed),
ruff + ty clean.

* feat(api,cli): replace task_type with workflow_ref/resolved_workflow (aws-samples#248 tasks 6-7)

Breaking API change (no alias): task_type is removed end-to-end; workflow_ref
selects a workflow and resolves to a pinned {id, version} at the create-task
boundary.

CDK/CLI types (task 6, in lockstep — check:types-sync):
- Remove TaskType + isPrTaskType; add ResolvedWorkflow.
- CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary
  resolved_workflow (+ mappers).
- CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1,
  coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id.

CDK resolution boundary (task 7):
- New workflows.ts: CDK-side descriptor mirror of agent/workflows/** +
  resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef,
  requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in
  sync with the shipped YAML.
- validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the
  resolved workflow's required_inputs contract.
- create-task-core.ts: resolve workflow, validate inputs against the contract,
  persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows
  in Phase 1 (repo-less admission is Phase 3).
- preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits
  to passed (seam for Phase 3).
- orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive
  read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type
  → resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with
  agent run_task in task 8).

CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands
next to consume resolved_workflow.

* feat(agent): complete task_type→workflow cutover in the agent (aws-samples#248 task 8)

Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the
gated seam; the workflow runner is now the sole agentic path, driven by the
resolved_workflow {id, version} threaded from the create-task boundary.

- models.py: delete TaskType enum; TaskConfig gains resolved_workflow,
  policy_principal, is_pr_workflow (drops task_type).
- config.py: build_config takes resolved_workflow (not task_type); derives
  policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES
  → PR_WORKFLOW_IDS.
- Cedar UNCHANGED (Phase 2a owns the principal migration): the
  Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched.
  policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task,
  pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType()
  validation. runner.py passes config.policy_principal as the principal. All
  test_policy* pass unchanged.
- prompts rekeyed by workflow id (get_system_prompt falls back to the default
  coding prompt for an unknown id rather than raising).
- pipeline.py: _execute_agent_step loads from resolved_workflow, always-on;
  the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an
  explicit strategy (create/push_resolve/resolve) from the workflow step,
  resolving the deferred code-review item.
- post_hooks.ensure_pr: strategy param replaces task_type self-inspection.
  repo.py: PR-branch resume keys off config.is_pr_workflow.
- server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR
  workflow ids.
- New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) +
  coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4
  first-party files validate; CDK descriptor drift-guard green.
- docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight
  mirror synced.

Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes
the breaking cutover across API + CLI + agent.

* fix(agent): use ty ignore syntax + annotate workflow test handlers (aws-samples#248)

The full build was never completed on this branch; ty type-checking and ruff
format gates were red. No src logic changes — type/format hygiene only.

- Tests used mypy-style `# type: ignore[arg-type]`, but the project type
  checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd.
  - test_workflow_runner.py: annotate inline step handlers as
    StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`;
    type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias
    params as positional-only → dict-value invariance rejects named-param
    funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`.
  - test_entrypoint.py: assert `resolved_workflow is not None` before
    subscripting (TaskConfig.resolved_workflow is `dict | None`).
- Apply ruff format reflows (runner.py, validator.py,
  test_workflow_tree_valid.py) the branch was committed without.

* feat(agent): Cedar read-only enforcement keys off context.read_only (aws-samples#248 Phase 2a)

Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"`
principal match to the `context.read_only` attribute, so the Write/Edit
hard-deny attaches to the *property* and fires for every read-only workflow
uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge.

This is the security-load-bearing step: an error here *silently weakens*
enforcement (the rule stops matching) rather than failing loudly. Guarded by
new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy
(Python) and cedar-wasm (TypeScript) engines.

Policy (both bindings, kept byte-for-byte identical — drift guard enforces):
- agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts:
  pr_review_forbid_write/edit (principal == "pr_review") →
  read_only_forbid_write/edit (when context.read_only == true, any principal).

Wiring:
- policy.py: PolicyEngine gains read_only kwarg; threads read_only into every
  Cedar request context (probe + base_context).
- models.py: TaskConfig.read_only; config.py derives it from the resolved
  workflow; runner.py passes config.read_only to PolicyEngine.
- workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in
  policy_principal_for — the principal is now an identity/audit tag only;
  pr-review keeps its own id-derived principal.

Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit
(read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒
base_permit — proves the deny is gated on the property, not always-on).

Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the
deny fires for any read-only principal AND that read_only=false permits Write;
test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean.

Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering
requirement (2b shipped first behind the bridge, so read-only was never
unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table
updated to past tense.

* docs(workflows): resolve repo-optional open questions + freeze schema (aws-samples#248)

ADR-014 addendum (2026-06-08) settles the two open questions that gated the
Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied
schema reshape applied, the schema is frozen — later phases add handlers and
plumbing, not schema fields.

Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}.
Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross-
workflow pooling explicitly not adopted. NO schema field — a fixed platform
fallback, applied in memory.py in Phase 3 when repo is absent.

Decision 2 — Artifact delivery: named Python deliverers (open target string),
shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered
deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS —
a new delivery method is a registered deliverer, not a schema change. Plumbing
frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant,
size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag.

Schema reshape applied:
- workflow.schema.json: steps[].target drops its enum (stays type:string,
  minLength:1); the closed set moves into the deliverer registry.
- models.py: DeliverTarget Literal → str.
- New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry +
  produced_outcomes(); single source of truth for "what each target produces."
- validator.py rule 11: consults the registry (_deliverer_produces) instead of
  the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep
  their exact produced-outcome sets, so NO existing workflow / fixture / golden
  vector changes verdict; an unknown deliverer name now produces nothing (new
  guard, tested).

Tests: test_workflow_deliverers.py (registry contract) + a rule-11
unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact
remains a NotImplementedError stub until Phase 3 — only the contract is frozen.

* feat(api,agent): repo-optional admission for repo-less workflows (aws-samples#248 Phase 3)

First slice of Phase 3 (repo-optional tasks): the admission path now accepts a
submission with no repo when the resolved workflow does not require one, so a
repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated,
and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip
slices follow separately.

Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false
means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less
behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow
given a repo still takes the repo-bound path.

Types (CDK + CLI mirror, kept in sync — check:types-sync green):
- CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo,
  TaskSummary.repo → string | null. Mappers null-coalesce.

Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on
workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding +
blueprint-cap lookup runs only when a repo is present; repo-less takes the platform
default cap. Record omits repo when absent; replay-required-fields drops 'repo';
event metadata null-coalesces.

Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that
assembles from task_description + per-user memory, returning before the repo-bound
fetch path; assembleUserPrompt drops the Repository: line when repo is absent.

Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take
an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum).

Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to
string|undefined. fanout-task-events skips the GitHub channel for a repo-less task.
loadBlueprintConfig skips the RepoTable lookup when repo is absent.

Agent (config.py): build_config loads the workflow up-front and requires
repo_url/github_token only when requires_repo; repo-less runs from
task_description. CLI shows '—' for a repo-less task.

Tests: repo-less acceptance (create-task-core, create-task, context-hydration,
CLI format) + repo-bound-missing-repo rejection; memory test log key repo →
actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean.

deliver_artifact is still a NotImplementedError stub — a repo-less task can be
admitted and hydrated but not yet deliver an artifact (next slice).

* feat(agent): repo-less pipeline path via the workflow runner (aws-samples#248 Phase 3)

Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container
with no clone / build / PR. When config.requires_repo is false, run_task delegates
to _run_repoless_task, which drives the workflow's steps (hydrate_context →
run_agent) through the workflow step runner — coding vs knowledge now differ by
the workflow's steps list, per ADR-014's end-state — and assembles a terminal
TaskResult with pr_url=None.

Scope boundary (deliberate): only hydrate_context + run_agent are driven here.
deliver_artifact is the workflow's terminal step but its handler is a stub until
the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via
only_kinds; delivery is deferred, the agent run is not. The coding path is left
byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled
task) is NOT yet runner-driven, so routing the full coding step list through the
runner is a separate cancel-aware change, intentionally not done here.

- models.py: TaskConfig.requires_repo (defaults True; coding unaffected).
  config.py build_config sets it from the resolved workflow.
- pipeline.py: requires_repo branch before the repo-coupled segment;
  _run_repoless_task runs the runner + assembles/persists the terminal result.
  run_task repo_url now defaults to "" (repo-less callers omit it).
- prompts/default_agent.py: repo-less system prompt (no git/branch/PR
  placeholders); registered for default/agent-v1.
- prompt_builder.build_repoless_system_prompt + shared _render_memory_context.

Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt);
repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean.

* fix(agent): address PR-review findings on Phase 2a/3 (aws-samples#248)

Findings from the plugin review (code-reviewer, silent-failure-hunter,
pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD.

HIGH — repo-less task reported COMPLETED while delivering nothing
(silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until
the artifacts contract ships), but the workflow's terminal outcome (e.g.
`comment`) was never produced, yet status was success. Now: if the primary
terminal outcome is delivery-backed and delivery was skipped, the task is a loud
FAILED with an explicit error + delivery_deferred milestone naming the gap.

MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer,
silent-failure-hunter). The WorkflowValidationError branch compared workflow_id
against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting
its comment, and defaulted read_only=False (fail-open) for unknown ids. Now:
log the failure (was silent); fail CLOSED — read_only=True for any id not in
_KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default
(REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1).

MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added
_validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id):
requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and
github_token now default to "" so a repo-less TaskConfig is constructible; the
validator preserves the repo-bound invariant.

LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in
DELIVERERS (type-design-analyzer), so an unknown target is caught universally,
not only when it collides with the primary outcome.

Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo
(admission + hydration); repo-less memory keyed user:{user_id} (asserted);
malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit +
repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config
load-failure fail-closed; TaskConfig validator; rule-8 deliver target.

agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean.

* feat(agent,api): deliver_artifact + repo-less memory — close criterion aws-samples#4 (aws-samples#248 Phase 3)

Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted,
hydrated, agent loop, AND delivers its output. Closes acceptance criterion aws-samples#4
(a non-coding task runs without a repo) for the s3/comment deliverers.

Infra (IAM + env):
- agent-session-role.ts: s3:PutObject grant scoped to
  artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors
  the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated.
- agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces).

Deliverers (agent/src/workflow/deliverers.py):
- DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3
  deliverer uploads the agent's result text to artifacts/{task_id}/result.md via
  the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a
  delivered_comment milestone for the channel fanout; s3_and_comment does both.
- AgentResult.result_text captures ResultMessage.result (the knowledge-task
  deliverable); runner.py populates it on success.
- runner._handle_deliver_artifact: now implemented (was NotImplementedError),
  routes through deliver(); a delivery failure is a failed step (terminal FAILED),
  never a silent skip.

Pipeline (pipeline.py):
- _run_repoless_task drives the FULL step list (deliver_artifact included);
  replaces the delivery-deferred FAILED gate with the real runner verdict (agent
  succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri.
- Repo-less episodic memory write keyed user:{user_id} (fail-open).

Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts
owner/repo OR user:{id} so the same write path serves coding + knowledge tasks.

Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors
trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green.

Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment,
missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets
artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact
dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth +
cdk-nag clean; ty + ruff + type-sync clean.

* feat(agent): make repo-less delivery retrievable + ship a knowledge workflow (aws-samples#248 Phase 3)

Three should-fix items closing the gap where "criterion aws-samples#4" was met by machinery
but the default repo-less path delivered to a channel that isn't wired.

1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub
   — it ships now. Updated to describe the real full-step-list run.

2. Default workflow delivers retrievably. The channel fan-out only knows
   slack/email/github; an api-origin repo-less task (the common default case) has
   no channel, so a `comment`-only deliverer surfaced nothing the user could get.
   default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`):
   the S3 upload to artifacts/{task_id}/ is always retrievable, the comment
   milestone is still rendered by the fan-out when a channel exists.

3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml
   — the concrete repo-less demonstration (research → S3 artifact, no repo).
   Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor
   mirror + drift-guard + resolver tests updated.

Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no
registered prompt fell back to the CODING prompt (leaking unsubstitutable
{repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less
workflows fall back to the repo-less default-agent prompt instead.

Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment);
repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 /
cdk 1833 green; ty + ruff + type-sync clean; docs synced.

* fix(agent): address full-branch review — 2 repo-less blockers + cleanups (aws-samples#248)

Findings from the four-reviewer full-branch pass. Two were blockers that would
have shipped the repo-less feature broken despite green unit tests (the tests
didn't cross the server/task_state boundaries).

BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py
_validate_required_params required repo_url unconditionally, so /invocations
returned 400 for every repo-less task before the pipeline ran (ECS backend
dodged it by calling run_task directly). Now gates repo_url on the workflow's
requires_repo (resolved via load_workflow; fails SAFE — assume required — on a
load error). Test: repo-less payload accepted, repo-bound still requires repo.

BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's
field allowlist had no artifact_uri branch, so the URI was dropped before
DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable
despite the S3 object existing). Added the persist branch + tests.

Secondary:
- comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md
  state plainly that delivered_comment is recorded for the event stream (visible
  in `bgagent watch`) but is NOT yet routed to an external channel (not in the
  fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded.
- test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)):
  dropping it now fails a test instead of silently disabling the Write/Edit deny.
- stale phase-boundary comments corrected (post_review docstring, runner module
  docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers)
  — they described Phase 2a/2b/3 as pending when those shipped on this branch.
- ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable
  after rebase) with a descriptive reference.

agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced.

* test(cli): cover artifact_uri render lines (aws-samples#248 Phase 3)

The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot
introduced uncovered truthy branches that dropped CLI branch coverage below the
84% gate (full `mise run build` caught it; the per-package run had passed on the
prior content). Adds non-null + null cases for both renderers, mirroring the
existing trace_s3_uri tests. cli 300 passed; full build green.

* fix(agent): address PR review findings + green CI (aws-samples#248)

CI: ruff-format the 4 files that tripped the build self-mutation
(config.py, deliverers.py, test_pipeline.py, test_task_state.py) so
"Fail build on mutation" passes.

Review findings:
- prompts/__init__.py: get_system_prompt now logs WARN when an id has no
  registered prompt and falls back, restoring the signal the old
  build_system_prompt emitted (silent-failure-hunter).
- pipeline._run_repoless_task: chain wf_result.failed_step.error into the
  synthesized AgentResult so a run_agent failure on the repo-less path is
  diagnosable instead of a generic message (code-reviewer).
- server._validate_required_params: narrow the broad `except Exception`
  around load_workflow to WorkflowValidationError/ImportError so a genuine
  programming error surfaces loudly instead of being masked as "could not
  resolve requires_repo" (still fails safe — repo required).
- workflows.ts workflowIsReadOnly: document why the unknown-id default is
  `false` (conservative admission: needsWrite=!readOnly) and must NOT be
  "aligned" to the agent-side fail-closed read-only default.
- runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri,
  drop never-written review_posted, rule 4→6, post_review handler is
  registered-but-unimplemented not "no shipped handler" (comment-analyzer).

Tests (pr-test-analyzer gaps):
- test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and
  a write_task_episode(actor="user:...") reaching create_event.
- test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent
  succeeds but deliver_artifact fails → terminal FAILED naming the step).

Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less
knowledge workflows; Starlight mirror regenerated.

agent 962 passed, ruff+ty clean, cdk compile+workflows tests green.

* fix(agent): ship first-party workflow files in the image + sync user docs (aws-samples#248)

The Dockerfile copied agent/src, agent/policies, and contracts but never
agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/
<name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add
COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON
schema land in the image. Verified in a local build: load_workflow(
'coding/new-task-v1') resolves to 1.0.0.

Also bring the user-facing guides in line with the workflow-driven model:
USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and
resolved_workflow responses (the retired task_type is shown only as a migration
mapping). Rename the "Task types" section to "Workflows", update the Starlight
sync map + sidebar slug, and regenerate the mirrors.

* fix(agent,api,cli): address PR-review findings on the workflow seam (aws-samples#248)

Resolve five verified findings from the aws-samples#248 review:

HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the
full tool surface, so a workflow's declared allowed_tools never reached
ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1,
web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools
(populated from the resolved workflow), resolve it in run_agent via
_resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer
tests.

Cedar missing-read_only fail-open — add the ADR-014-promised parity vector.
With context omitting read_only, both engines fail OPEN (Allow + base_permit);
safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add
contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy
AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject
discipline.

MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when
the primary outcome is `artifact` but no artifact_uri was produced (matches the
WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the
deliver step raised.

MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST +
disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model
=> 400, no silent downgrade); descriptor sync-test asserts modelId matches the
YAML and stays on the allow-list.

LOW — CLI repo-less submission. --repo is no longer a hard requiredOption:
required unless --workflow selects a repo-less workflow (still required with
--pr/--review-pr). Lets bgagent reach knowledge/web-research-v1.

Rule 10 (one production version per lineage) remains intentionally unenforced
at the single-file layer — it is a registry/promotion-time property.

* fix(agent,api,cli): address PR-review aws-samples#296 resolution-ladder + deliverer findings (aws-samples#248)

Resolve the remaining findings from isadeks's review (findings #2/aws-samples#4 were
already fixed in the prior commit):

#1 — bare submit silently stopped opening PRs. The server ladder resolves an
absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI
`submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI
now sends coding/new-task-v1 explicitly when a repo is present and no workflow/
pr flag is given, preserving the old new_task default.

aws-samples#3 — repo-optional + repo disagreed across the boundary. The agent took the
repo-less path off config.requires_repo alone, while the orchestrator built a
repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less
path only when requires_repo is false AND no repo was supplied; a repo-optional
workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK
admission and the prompt.

aws-samples#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors
the constraint (Phase 1: must match the single shipped version) and returns
null otherwise; create-task-core 400s with a distinct "version not available"
message via resolveWorkflowRefError, instead of quietly running the shipped
version.

aws-samples#7 — deliver_artifact unset-target default disagreed between validator (lenient
full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single
source of truth; produced_outcomes(None) now models that exact default, so a
primary:comment workflow that omits target is correctly flagged by rule 11.

aws-samples#10 — descriptor/YAML drift had no parity test. The sync test now asserts
required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/
read_only/model, and a new test cross-checks the agent-side
_KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the
writeable repo-bound descriptors so drift fails CI.

* fix(agent): close PR-review aws-samples#296 follow-ups — post-hook fail-soft, prompt, cap, parity (aws-samples#248)

aws-samples#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload
failure AFTER run_agent mutated/committed the tree stranded the work as FAILED
with no PR. read_only now comes from config (already fallback-computed and the
verdict that drove Cedar); the reload — needed only for the ensure_pr strategy —
is wrapped in the same WorkflowValidationError fallback build_config uses,
defaulting to "create" so the PR still opens.

aws-samples#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to
the generic default-agent prompt. Add a research-specialized, repo-less prompt
(prompts/web_research.py) and register it.

aws-samples#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full
copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap
actually bounds memory on the MicroVM before materializing the bytes.

aws-samples#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID
/ DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform
default and be requires_repo:false in the YAML; the coding default must be
repo-bound. Closes the last hand-maintained-copy axis aws-samples#10 named.

Perf — memoize load_workflow (@cache): files are image-baked and immutable per
process and Workflow is frozen, so the 3-4 loads per task now parse once.

Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent
while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a
larger half-migrated-runner refactor both reviewers flagged as non-blocking.

---------

Co-authored-by: bgagent <bgagent@noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants